[PDF] Automated detection of smuggled high-risk security threats using Deep Learning

Abstract

The security infrastructure is ill-equipped to detect and deter the smuggling of non-explosive devices that enable terror attacks such as those recently perpetrated in western Europe. The detection of so-called "small metallic threats" (SMTs) in cargo containers currently relies on statistical risk analysis, intelligence reports, and visual inspection of X-ray images by security officers. The latter is very slow and unreliable due to the difficulty of the task: objects potentially spanning less than 50 pixels have to be detected in images containing more than 2 million pixels against very complex and cluttered backgrounds. In this contribution, we demonstrate for the first time the use of Convolutional Neural Networks (CNNs), a type of Deep Learning, to automate the detection of SMTs in fullsize X-ray images of cargo containers. Novel approaches for dataset augmentation allowed to train CNNs from-scratch despite the scarcity of data available. We report fewer than 6% false alarms when detecting 90% SMTs synthetically concealed in stream-of-commerce images, which corresponds to an improvement of over an order of magnitude over conventional approaches such as Bag-of-Words (BoWs). The proposed scheme offers potentially super-human performance for a fraction of the time it would take for a security officers to carry out visual inspection (processing time is approximately 3.5s per container image).

Full PDF

AAutomated detection of smuggled high-risk security threats usingDeep Learning

N. Jaccard , T.W. Rogers , , E.J. Morton , L.D. Grifﬁn ∗

1. Department of Computer Science, University College London, UK2. Department of Security and Crime Sciences, University College London, UK3. Rapiscan Systems Ltd., Stroke-on-Trent, UK

Keywords:

Deep Learning; X-ray; Small Metallic Threats;Border Security

Abstract

The security infrastructure is ill-equipped to detect and deterthe smuggling of non-explosive devices that enable terror at-tacks such as those recently perpetrated in western Europe.The detection of so-called “small metallic threats” (SMTs) incargo containers currently relies on statistical risk analysis, in-telligence reports, and visual inspection of X-ray images bysecurity ofﬁcers. The latter is very slow and unreliable dueto the difﬁculty of the task: objects potentially spanning lessthan 50 pixels have to be detected in images containing morethan 2 million pixels against very complex and cluttered back-grounds. In this contribution, we demonstrate for the ﬁrst timethe use of Convolutional Neural Networks (CNNs), a type ofDeep Learning, to automate the detection of SMTs in fullsizeX-ray images of cargo containers. Novel approaches for datasetaugmentation allowed to train CNNs from-scratch despite thescarcity of data available. We report fewer than 6% false alarmswhen detecting 90% SMTs synthetically concealed in stream-of-commerce images, which corresponds to an improvementof over an order of magnitude over conventional approachessuch as Bag-of-Words (BoWs). The proposed scheme offerspotentially super-human performance for a fraction of the timeit would take for a security ofﬁcers to carry out visual inspec-tion (processing time is approximately 3.5s per container im-age).

At the turn of the 21st century, the modus operandi of ter-rorist attacks in the West such as those in Madrid and Lon-don, often relied on the use of explosives. However, with the ∗ Correspondence: l.grifﬁ[email protected] † Please note, we use the term “small metallic threats” as we do notwish to make our research results easily discoverable by maliciousactors through keyword searching. However, the threats in questionare similar in form to hand drillsThis paper is a preprint of a paper submitted to the Imaging forCrime Detection and Prevention conference and is subject to Institu-tion of Engineering and Technology Copyright. If accepted, the copyof record will be available at IET Digital Library × pixel image is typical); threats concealed withinlegitimate cargo can be almost undetectable to the naked eyedue to complex or dense obscuration; and the diversity of ob-jects that can be found in a container make it impossible for theofﬁcers to learn the complete range of appearances for benignitems.In order to alleviate these issues, we propose the use ofcomputer vision and machine learning techniques for the au-tomated detection of SMTs in single-energy single-view X-ray cargo images. This approach provides multiple advantagesover manual inspection: i) orders of magnitude reductions ininspection times; ii) improved and potentially super-human de-tection performance; iii) computing power can be scaled upto meet the increasing volumes of images to inspect; and iv)greatly simpliﬁes scanning logistics by offering consistent pro- a r X i v : . [ c s . C V ] S e p essing times. However, most state-of-the-art computer vi-sion methods were developed for natural imagery (photogra-phy) ﬁrst and foremost, from which X-ray images differ sig-niﬁcantly due to their translucency, noise levels, clutter, andskewed perspective [4, 6, 7].Conventional computer vision methods that rely on “hand-crafted” features designed for natural images are thus unlikelyto perform optimally when applied to X-ray images. Ratherthan adapting existing features, or deriving novel ones, one caninstead use representation-learning methods whereby featuresthat optimize the separation of different image classes are learntdirectly from training images. Convolutional Neural Networks(CNNs), part of a family of learning algorithms known as DeepLearning (DL), are representation-learning methods [8] thatwere recently shown to signiﬁcantly outperform other com-puter vision approaches [9]. The main barrier to the applicationof CNNs to X-ray imagery is the scarcity of training images:threats are rare in Stream-of-Commerce (SoC) and acquiringimages of staged smuggling attempts is prohibitively costly andtime-consuming. In other ﬁelds, this issue was addressed byaugmenting the training dataset through the use of synthetic ex-amples [10, 11]. In this contribution, we employ a dataset aug-mentation method where physically-accurate images are syn-thesised by projection of threats into SoC images [12], enablingthe generation of very large number of de-novo examples withvery diverse appearance. We also show that log-transforminginput X-ray images signiﬁcantly improves SMT detection per-formance.This paper is structured as follows. First, related researchis discussed in Section 2. The methods used, including data setaugmentation, CNN architectures, and performance evaluation,are described in Section 3. Our main ﬁndings are presented anddiscussed in Section 4 before concluding in Section 5. The urgent need for robust methods to ﬁll the detection capabil-ity gap is not being matched by the current research output inautomated analysis for X-ray cargo images, which was recentlythroughly documented and reviewed in Ref. [13]. Impressiveperformance has been reported for the detection of securitythreats (including SMTs) [7, 14–16] in baggage X-ray images,partly made possible by the small dimensions and complexity(e.g. constrained packing and low diversity of objects) of bags,as well as the availability of data-rich and high resolution imag-ing modalities, including multi-view and volumetric scanning.In comparison, scenes in cargo container imagery tend to bemuch larger and complex, little constraints on how goods arearranged, and a very large and diverse space of possible objects(i.e. any object that physically ﬁts into a cargo container). Assuch, it is expected that performance for cargo images wouldbe in general lower than what has been reported for baggageimagery.Two methods for the automated veriﬁcation of manifestinformation based on machine vision algorithms were de-scribed [6, 17]. Zhang and colleagues [6] developed an ap-proach for the classiﬁcation of X-ray cargo images into 22 cat- egories (e.g. grain, tires) based on a Bag-of-Words learnt fromresponses to Leung-Malik ﬁlters. The categories of 51% and78% of images were in the top and top three categories pre-dicted by their scheme, respectively. Tuszynski al. [17] com-puted a city block distance to measure the distance betweenintensity histograms of log-transformed images and those oftraining images for each of the 92 categories considered. Basedon this distance, the scheme proposed by the author was ableto verify that a given image was associated with the correctcategory with 48% accuracy and a 5% false alarm rate, whichwas a signiﬁcant improvement over chance. When using thesame approach to predict the category of the imaged container,the category of 31% of the imaged container was correctly pre-dicted, and it was in the top ﬁve predictions 65% of the time.Approaches were also proposed for empty container veri-ﬁcation, which is useful to avoid unnecessary subsequent pro-cessing and to detect “false empties” [18, 19]. Rogers et al. [18]classiﬁed cargo container images as empty or non-empty basedon a set of ﬁxed geometric features (oriented Basic Image Fea-tures), image moments, and the coordinates of sampled win-dows learnt by a Random Forest classiﬁer. The use of win-dows coordinates as a feature encouraged the classiﬁer to learnlocation-dependent ranges of appearance. The authors reported99.3% detection with 0.7% false alarms on SoC images, and90% detection with 0.5% false alarms for synthetic adversarialexamples where objects equivalent to 1L of water were placedin empty containers. Andrews et al. [19] used anomaly detec-tion techniques, based on features extracted from the hiddenlayer of an auto-encoder, to perform the same task, achieving99.2% accuracy by training the system solely on down-scaledimages of empty containers and considering non-empty imagesas anomalies.We recently reported on the ﬁrst use of Deep Learning forthe detection of cars in complex X-ray imagery and reportedthat Convolutional Neural Networks (CNNs) signiﬁcantly out-performed conventional Bag-Of-Words (BoW) methods with a100% detection rate and fewer than 1-in-454 false alarms raisedfrom containers without a car present [20]. The scheme cor-rectly detected cars in cases where they were almost completelyoccluded by other goods. “Small Metallic Threats” (SMTs)are signiﬁcantly more challenging to detect than cars: i) smallform factors, ii) very large number of models and manufactur-ers, iii) appearance close to that of legitimate cargo, and iv)unrestricted orientation. We previously presented preliminaryresults for the detection of SMTs in small × patches at aconference, with the additional caveat that the most challengingcases (dense backgrounds) were left-out of the analysis [21]. Inthis contribution, we present results for the automated detectionof SMTs in full-size images and with performance evaluatedacross all types of background. In addition, we explore vari-ous network architectures and compare performance betweenpre-trained and trained-from-scratch CNNs. Methods

Benign images used for this work were acquired using aRapiscan Eagle R (cid:13) R60 rail scanner equipped with a 6MV linacsource. Images are 16-bit, grayscale, and their size variesbetween × and × pixel for 20 and 40ftlong cargo containers, respectively. The resolution is ≈ mmpixel − in the horizontal direction. The images were randomlysampled from Stream-of-Commerce (SoC) images acquiredover several weeks and can be empty ( ≈ of the dataset)or contain pallets of commercial cargo, heavy machinery andindustrial equipment, household goods, and bulk materials.SMT images were acquired separately and are part of a pro-prietary dataset. In total approximately 700 instances of SMTswere available across all types, models, and poses. The originalscans were not used directly, but instead individual instanceswere extracted to create a database of SMTs, which in turn wasused to synthesise de-novo examples for training. The synthe-sis process, based on the multiplicative nature of X-ray trans-mission image formation, was described elsewhere [18, 21] andhas recently been shown to be indistinguishable from real threatimagery [12]. In short, a patch containing a single SMT in-stance was ﬁrst cropped out of the full-size image. Pixel-wisesegmentation of SMT instances was carried out manually, re-sulting in a SMT binary mask. Background correction was per-formed by dividing the cropped patch by the mean intensity ofpixels outside of the SMT binary mask. If unrelated objectsor structures appeared in the patch (e.g. parts of other SMTsor supporting structures), the corresponding pixels were alsoignored during background correction. The SMT instance canthen be projected into another X-ray image by intensity multi-plication.Projecting the same SMT instance into different images re-sults in vastly different appearances due to the translucencyproperty of X-ray images. The dataset is made more diverseby the injection of realistic variations such as intensity scalingand ﬂipping.In order to train the classiﬁcation scheme, × SoC im-ages were randomly sampled and SMT instances were pro-jected into half of them. 75% and 25% of the dataset was usedfor training and testing, respectively. There was no overlap be-tween training and testing data, neither in the SoC backgroundsused, nor in the SMT instances projected.

For performance evaluation, it was assumed that images ofthe negative class (i.e. images without SMTs) would gener-ally produce lower image scores p I than images of the positiveclass (i.e. images containing at least one SMT). Various per-formance metrics were computed based on p I scores obtainedfor images in the test set, including the area under the ROCcurve (AUC) and the H-measure. The latter is a variant ofthe AUC that addresses issues related to underlying cost func-tions [22, 23]. In addition to the AUC and H-measure, the false positive rate (FPR) was determined by thresholding p I usingthe t threshold that resulted in a 90% detection rate. The detection of SMTs in X-ray cargo images was imple-mented as a binary classiﬁcation task, with benign images (noSMTs) taken as the negative class and SMT images (at leastone SMT) taken as the positive class. The image classiﬁca-tion scheme is window-based: i) small windows are denselysampled with a stride s ; ii) windows are classiﬁed and givena score p w,i (the conﬁdence that the i -th window contains aSMT or part thereof); iii) whole-image score p I is computed asthe maximum score across all windows; iv) image class predic-tion is obtained by comparing p I with a threshold t . Trainingwas thus conducted on a per-window basis, while performanceevaluation was carried out based on full-size scanner images. i) ii) iii) Figure 1. Effect of the log-transform on X-ray images of boltcutters. Image i) shows a photograph of the imaged bolt cutters,while ii) and iii) show the raw intensity and log-transformedimages, respectively. Note: bolt cutters are used for illustration,the SMTs of interest are often much smaller.For classiﬁcation by CNNs, the window size was × pixels and the stride s was 64 pixels. When comparing withBag-of-Words (BoW) approaches, the window size was re-duced to × pixels and the stride s to 32 pixels to maximizeBoW performance.Prior to classiﬁcation, images were preprocessed [18, 21]:i) black columns produced by faulty detectors or source mis-ﬁres were removed, ii) source intensity variations were cor-rected by normalization based on air intensity values, and iii)salt-and-pepper pixels were replaced by the local median in-tensity. Raw intensity experiments use preprocessed images asinput. When speciﬁed, images were log-transformed prior toclassiﬁcation; this transform is frequently used to facilitate de-tection of concealed items by security ofﬁcers during visual in-spection (Fig 1) and was also previously applied to automatedclassiﬁcation [17].In addition to the computation of the image score p I , aheatmap was generated during classiﬁcation by mapping thenormalized mean window score at each location (across allwindows overlapping at that location) to pixel values. Thesevisualizations serve two main purposes: i) clariﬁcation of clas-siﬁcation decision by approximately localizing detected SMTs(or the source of false positive signals), and ii) to serve as aguide to further action by the security ofﬁcer (e.g. physicalinspection). FS-A TFS-B

INPUTCONVFC

TFS-C

Figure 2. Trained-from-scratch (TFS) network conﬁgurationsevaluated. A. Single channel input images, B. Two channelinput images, and C. Two input images feeding into separateconvolutional layer streams.

The main type of CNN evaluated in this contributionwere trained-from-scratch (TFS) using the

MatConvNet li-brary [24]. Their architecture is based on the very deep net-works ﬁrst described by Simonyan and Zisserman [25], wheremultiple convolutional (CONV) layers with small × ﬁltersare stacked in-between “max pooling” layers and feed forwardinto three fully-connected (FC) layers. 11-layer (8 CONV + 3FC) and 19-layer (16 CONV + 3 FC) variants were explored.For both variants, three conﬁgurations were evaluated (Fig. 2):grayscale image input (TFS-A, raw or log-transformed in-tensities); dual channel image input (TFS-B, raw and log-transformed intensities); and separate raw and log-transformedinputs to distinct branches of the network (with no weight shar-ing) whose features are concatenated after the ﬁrst FC (TFS-C).In all cases, the window score p w,i was given by the output ofthe softmax layer for the positive class.Batch normalisation (ﬁxing the mean and variance of inputdistributions at each layer) was used for network regularisationand to speed up training [26]. Weight decay and momentumwere ﬁxed at − and 0.9, respectively. Learning rate wasdecreased from − to − over the course of 30 epochs. Themean image computed across the training set was subtractedfrom each input image. In addition, images were also randomlyﬂipped (horizontally and/or vertically) at training.In addition to TFS CNNs, pre-trained (PT) networks werealso evaluated. Features were extracted from the FC1 and FC2layers of a VGG-VD-19[25] model, whose architecture is verysimilar to the 19-layer TFS CNN, trained on ImageNet (datasetof natural photographic images) and were classiﬁed using Ran-dom Forest classiﬁers. Input images were resized to × and the grayscale channel was replicated twice in the third di-mension to match the expected RGB format. For PT CNNs, thewindow score p w,i was computed as the fraction of trees votingfor the positive class. In addition to CNNs, Bag-of-Words (BoW) features were alsoevaluated: oriented Basic Image Features (oBIFs) and PyramidHistograms Of visual Words (PHOW). BIFs are ﬁxed geomet-ric features, classifying each pixel of an image into one of seven Table 1. Performance for the detection of SMTs in X-ray cargoimages. For clarity, some results were omitted from the table.“+Log” denotes that images were log-transformed prior to clas-siﬁcation. FPR90 is the false positive rate for a 90% detectionrate.

Method AUC H-measure FPR90 oBIFs 0.72 0.19 0.72oBIFs + Log 0.59 0.04 0.88PHOW 0.72 0.18 0.75PHOW + Log 0.73 0.20 0.75CNN-19-PT-FC1 0.67 0.17 0.86CNN-19-PT-FC1 + Log 0.61 0.12 0.89CNN-19-PT-FC2 0.67 0.17 0.85CNN-11-TFS-A + Log 0.95 0.72 0.13CNN-11-TFS-B 0.95 0.70 0.15CNN-19-TFS-A 0.89 0.53 0.47CNN-19-TFS-A + Log 0.96 0.75 0.09

CNN-19-TFS-B 0.97 0.78 0.06

CNN-19-TFS-C 0.96 0.75 0.10Figure 3. SMT detection on an empty container using a selec-tion of the algorithms evaluated. Images have been scaled sothat a value of 1.0 (red) corresponds to a false positive detec-tion for a 90% true positive rate. The best performing schemeis marked *.categories according to local symmetry [27]. For this work, weused the extended formulation (oBIFs) where the orientationof rotationally asymmetric features is quantized, resulting in16 new categories, for a total of 23 [28]. The oBIF computa-tion was carried out at four scales ( σ = { . , . , . , . } ) andtwo threshold parameters ( γ = { . , . } ). These parameterswere previously shown to be optimal for detection of cars incargo containers [20]. The feature vector for a window was184-dimensional.PHOW were proposed as a multi-scale extension of denseSIFT (Scale-Invariant Feature Transform) [29, 30] and arecomputed as follows: i) computation of dense SIFT for the im-age considered at four scales (4, 6, 8, and 10 pixel spatial bins);ii) learning of a 300 visual word dictionary by k -means cluster-ing of dense SIFT; and iii) computation of a two-level pyramidhistogram of visual words ( × and × spatial bins). Theresulting feature vector was 6000-dimensional.Random Forest models were used for classiﬁcation of im-igure 4. SMT detection on a busy container image that doesnot contain a SMT using a selection of the algorithms evalu-ated. Images have been scaled so that a value of 1.0 (red) cor-responds to a false positive detection for a 90% true positiverate. The best performing scheme is marked *.ages based on oBIFs and PHOW features. The SMT detection performance obtained for the differentmethods evaluated are presented in Table 1 and summarized inTable 2. These results highlight the challenging nature of thisclassiﬁcation task. Overall, Bag-of-Words (BoW) methods per-formed poorly; the best AUC and H-measure was achieved byPHOW on log-transformed inputs while oBIFs had the lowestfalse positive rate for 90% detection rate (FPR90) with 72%.Interestingly, log-transformed inputs slightly increase perfor-mance of PHOW but was detrimental to that of oBIFs, poten-tially due to non-optimal parameter choices.Pre-trained (PT) CNNs have previously been applied suc-cessfully to X-ray imagery and delivered robust baseline per-formance [16, 20]. However, they generally fared worse thanBoW approaches for SMT detection, indicating that genericfeatures that are optimal for natural image classiﬁcation, andthat perform reasonably well for the detection of large objectsin X-ray images, are not directly transferable to this task.In all cases, trained-from-scratch (TFS) CNNs outper-formed both BoW methods and PT CNNs. It was found thatlog-transforming the image was key in achieving improvedperformance. For example, log-transforming inputs when us-ing a single channel input (TFS-A) decreased the FPR90 from47% down to 9%. A smaller but still signiﬁcant improve-ment was obtained by using inputs with both raw and log-transformed channels (TFS-B), resulting in a further 3% dropin FPR90 to 6%. Surprisingly, the network architecture that hastwo separate streams of convolutional layers for raw and log-transformed input images did not perform better than just us-ing a single log-transformed input (TFS-A + Log). One couldexpect that encouraging the network to learn channel-speciﬁcfeatures would improve classiﬁcation given the difference inappearance between the two channels. Potentially, this couldbe explained by the much more complex network over-ﬁttingthe training data. The FPR90 was more that doubled when us-ing a shallower network (19-TFS-B versus 11-TFS-B), indicat- [h]Figure 5. SMT detection examples using CNN-19-TFS-B.SMTs are deliberately censored by a red rectangle (the dimen-sion of the rectangles is identical to that of the SMT). i) to iii)shows SMTs concealed in the fabric of the container while iv)and viii) are placed amongst legitimate cargo.ing that the added complexity did not lead to over-ﬁtting in thiscase.When processing a benign image of an empty container,the TFS CNNs are the only methods that did not lead to ex-cessive false positive signals (Fig. 3). Similarly, when given abenign image of a container loaded with industrial equipmentand objects, whose appearance closely resemble that of SMTs,PT CNNs and to a lesser degree BoW methods generated verylarge number of false alarms (Fig. 4). In contrast, only a fewimage locations had any kind of signal associated with themwhen using TFS CNNs, and in the case of the dual-channelinput variant, no instance was above the threshold to trigger afalse alarm.Examples of successful detections using CNN-19-TFS-Bare presented in Figure 5. In most cases, the signal is well-localized and the classiﬁcation very speciﬁc, especially whenprojected into empty containers (Fig. 5.i and ii). The exampleswhere the SMTs are concealed amongst other cargo (Fig. 5.iii-viii) would be very challenging to detect by visual inspection,especially under time pressure.

We have proposed a Deep Learning scheme for the detection of“small metallic threats” (SMTs) in X-ray cargo images. Usinga novel method for the generation of a suitably large and di-verse dataset of physically-realistic synthetic images, Convolu-tional Neural Networks (CNNs) could be trained-from-scratch.We report a 1-in-17 false alarm rate for 90% detection, whichsigniﬁcantly outperforms other methods evaluated, includingclassiﬁcation based on pre-trained CNNs and Bag-of-Wordsfeatures (Table 2). The processing time using a Titan X GPUwas 3.5 second per image in average, which is signiﬁcantlylower than the time taken by operators to inspect cargo con-ainer images.Table 2. Summary of best performance obtained for each ap-proach (see Table 1)

Method AUC H-measure FPR90

BoW 0.72 0.19 0.72CNN-PT 0.67 0.17 0.86CNN-TFS 0.97 0.78 0.06The scheme described could potentially result in a stepchange in SMT detection capability. However, further researchis required before it is ready to be deployed in the ﬁeld. Dueto the lack of real images containing SMTs concealed amongstlegitimate cargo, we have relied on synthetic images for per-formance evaluation. While all efforts were made to evaluatethe system in a way that is meaningful and as representative ofreal-real world performance as possible (e.g. by using fully dis-joint datasets for training and testing, for both threats projectedand background patches), it is essential for performance to beevaluated based on real images showing realistic placement ofSMTs.

Acknowledgements

This work was funded by Rapiscan Systems, and by EPSRCGrant no. EP/G037264/1 as part of UCL’s Security ScienceDoctoral Training Centre.

References [1] J. King, “The security of merchant shipping,”

Marine Policy ,vol. 29, no. 3, pp. 235–245, 2005.[2] S. F. Weele and J. E. Ramirez-Marquez, “Optimization of con-tainer inspection strategy via a genetic algorithm,”

Annals of Op-erations Research , vol. 187, no. 1, pp. 229–247, 2010.[3] K. Archick,

US-EU cooperation against terrorism . DIANEPublishing, 2010.[4] F. D. McDaniel, B. L. Doyle, G. Vizkelethy, B. M. Johnson, J. M.Sisterson, and G. Chen, “Understanding X-ray cargo imaging,”

Nuclear Instruments and Methods in Physics Research SectionB: Beam Interactions with Materials and Atoms , vol. 241, no. 1,pp. 810–815, 2005.[5] J. M. Wolfe, D. N. Brunelli, J. Rubinstein, and T. S. Horowitz,“Prevalence effects in newly trained airport checkpoint screen-ers: trained observers miss rare targets, too.”

Journal of vision ,vol. 13, no. 3, p. 33, 2013.[6] J. Zhang, L. Zhang, Z. Zhao, Y. Liu, J. Gu, Q. Li, and D. Zhang,“Joint Shape and Texture Based X-Ray Cargo Image Classiﬁca-tion,” in

Conference on Computer Vision and Pattern Recogni-tion Workshops . IEEE, 2014, pp. 266–273.[7] M. Bas¸tan, M. R. Youseﬁ, and T. M. Breuel, “Visual wordson baggage x-ray images,” in

Conference on Computer Analy-sis of Images and Patterns , ser. CAIP’11. Berlin, Heidelberg:Springer-Verlag, 2011, pp. 360–368.[8] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,”

Nature ,vol. 521, no. 7553, pp. 436–444, 2015.[9] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into recti-ﬁers: Surpassing human-level performance on imagenet classi-ﬁcation.” IEEE, 2015.[10] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman,“Synthetic Data and Artiﬁcial Neural Networks for NaturalScene Text Recognition,”

ArXiv e-prints , 2014.[11] A. Lerer, S. Gross, and R. Fergus, “Learning Physical Intuitionof Block Towers by Example,”

ArXiv e-prints , 2016. [12] T. W. Rogers, N. Jaccard, E. D. Protonotarios, J. Ollier, E. J.Morton, and L. D. Grifﬁn, “Threat image projection (tip) intox-ray images of cargo containers for training humans and ma-chines,” in

IEEE International Carnahan Conference on Secu-rity Technology , 2015.[13] T. W. Rogers, N. Jaccard, E. J. Morton, and L. D. Grifﬁn, “Au-tomated x-ray image analysis for cargo security: Critical reviewand future promise,” arXiv preprint arXiv:1608.01017 , 2016.[14] D. Turcsany, A. Mouton, and T. P. Breckon, “Improving feature-based object recognition for X-ray baggage security screeningusing primed visualwords,” in

International Conference on In-dustrial Technology . IEEE, 2013, pp. 1140–1145.[15] G. Flitton, A. Mouton, and T. P. Breckon, “Object classiﬁcationin 3d baggage security computed tomography imagery using vi-sual codebooks,”

Pattern Recognition , vol. 48, no. 8, pp. 2489–2499, 2015.[16] S. Akc¸ay, M. E. Kundegorski, M. Devereux, and T. P. Breckon,“Transfer learning using convolutional neural networks for ob-ject classiﬁcation within x-ray baggage security imagery,” in

Proceeding of the International Conference on Image Process-ing, IEEE . IEEE, 2016.[17] J. Tuszynski, J. T. Briggs, and J. Kaufhold, “A method for auto-matic manifest veriﬁcation of container cargo using radiographyimages,”

Journal of Transportation Security , vol. 6, no. 4, pp.339–356, 2013.[18] T. W. Rogers, N. Jaccard, E. J. Morton, and L. D. Grifﬁn,“Detection of cargo container loads from X-ray images,”

In:Proceedings IET International Conference on Intelligent SignalProcessing , pp. 6 .–6 .(1), 2015.[19] J. T. A. Andrews, E. J. Morton, and L. D. Grifﬁn, “Detectinganomalous data using auto-encoders,”

International Journal ofMachine Learning and Computing , vol. 6, no. 1, pp. 21–26,2016.[20] N. Jaccard, T. W. Rogers, E. J. Morton, and L. D. Grifﬁn, “De-tection of concealed cars in complex cargo X-ray imagery usingdeep learning,”

ArXiv e-prints , 2016.[21] N. Jaccard, T. W. Rogers, E. J. Morton, and L. D. Grifﬁn, “Tack-ling the X-ray cargo inspection challenge using machine learn-ing,”

In: Proceedings SPIE , vol. 9847, pp. 98 470N–98 470N–13, 2016.[22] D. J. Hand, “Measuring classiﬁer performance: a coherent al-ternative to the area under the roc curve,”

Machine learning ,vol. 77, no. 1, pp. 103–123, 2009.[23] D. J. Hand and C. Anagnostopoulos, “A better Beta forthe H measure of classiﬁcation performance,” arXiv preprintarXiv:1202.2564 , 2012.[24] A. Vedaldi and K. Lenc, “Matconvnet-convolutional neural net-works for matlab,” arXiv preprint arXiv:1412.4564 , 2014.[25] K. Simonyan and A. Zisserman, “Very Deep ConvolutionalNetworks for Large-Scale Image Recognition,” arXiv preprintarXiv:1409.1556 , 2014.[26] S. Ioffe and C. Szegedy, “Batch Normalization: AcceleratingDeep Network Training by Reducing Internal Covariate Shift,” arXiv preprint arXiv:1502.03167 , 2015.[27] L. D. Grifﬁn, M. Lillholm, M. Crosier, and J. van Sande, “Basicimage features (bifs) arising from approximate symmetry type,”in

Scale Space and Variational Methods in Computer Vision ,ser. Lecture Notes in Computer Science, X.-C. Tai, K. Mrken,M. Lysaker, and K.-A. Lie, Eds. Springer Berlin Heidelberg,2009, vol. 5567, pp. 343–355.[28] A. J. Newell and L. D. Grifﬁn, “Natural Image Character Recog-nition Using Oriented Basic Image Features,” in

InternationalConference on Digital Image Computing: Techniques and Ap-plications . IEEE, 2011, pp. 191–196.[29] A. Bosch, A. Zisserman, and X. Mu˜noz, “Scene classiﬁcationvia plsa,” in

Computer Vision–ECCV 2006 . Springer, 2006, pp.517–530.30] A. Bosch, A. Zisserman, and X. Munoz, “Image classiﬁcationusing random forests and ferns,” in