A Comparison of Adaptive and Template Matching Techniques for Radio-Isotope Identification
Emma J. Hague, Mark Kamuda, William P. Ford, Eric T. Moore, Johanna Turk
AA Comparison of Adaptive and Template MatchingTechniques for Radio-Isotope Identification
Emma J. Hague a , Mark Kamuda b , William P. Ford a , Eric T. Moore a , and Johanna Turk ca Remote Sensing Laboratory, Joint Base Andrews, Maryland b University of Illinois, Champaign-Urbana, Illinios c Barnstorm Research, Boston, MA
ABSTRACT
We compare and contrast the effectiveness of a set of adaptive and non-adaptive algorithms for isotope identifi-cation based on gamma-ray spectra. One dimensional energy spectra are simulated for a variety of dwell-timesand source to detector distances in order to reflect conditions typically encountered in radiological emergencyresponse and environmental monitoring applications. We find that adaptive methods are more accurate andcomputationally efficient than non-adaptive in cases of operational interest.
Keywords:
Radiation Detection, Isotope Identification, Data Science, Machine Learning, Neural Network,Template Matching, Likelihood, Spectroscopy
1. INTRODUCTION
Algorithmic gamma-ray spectrum classification and isotope identification is a well developed field.
In radiolog-ical search and environmental health operations there are widely deployed tools based on non-adaptive templatematching schemes to or supplement a human spectroscopist or even fully automate spectrum classification andisotope identification. In recent years adaptive techniques, such as neural networks and decision tress, have beendeveloped in the research literature.
1, 2
These new techniques have not been directly compared to the histori-cal approaches in order to gauge any increased accuracy or efficiency. Furthermore, there is a dearth of studypertaining to the often crucially important operational considerations of dwell-time for collecting each spectrumand the distance between the source and the detector. Both of these operational factors can dramatically in-fluence the classification accuracy and, thus, yield significantly different operational outcomes. In this work, weendeavor to systematically compare two non-adaptive (template matching) techniques to more modern adaptivetechniques in the context of realistic operational time and distance, and comment on the befits of each.
2. DATA
All data in this work are derived from modeled source and background templates, and have been Poisson sampledto generate sufficient training and testing sets. An advantage of using modeled data in this study is that we canprecisely control, and know in advance, the true signal strength of each modeled scenario. This knowledge lendsconfidence and credibility to our conclusions since we will not be guessing, or relying on human spectroscopists,for the truth of each encounter. Another advantage of using modeled data is that we can avoid the class imbalanceproblems ? , 2 typically associated with data sets collected from controlled laboratory or operational settings. ForExample, even though Cu is a much less common isotope to encounter in the environment, we will have justas many examples of it as we will the ubiquitous m Tc. An major disadvantage is that modeling software isknown to not fully and correctly reproduce the myriad of complications that are routinely found in real-worlddata. Towards our goal of directly comparing and contrasting adaptive and non-adaptive methods, we asses thatthe advantages out-weigh the draw backs.
Send correspondence to Emma Hague: E-mail: [email protected], Telephone: +1.301.817.3361 a r X i v : . [ phy s i c s . d a t a - a n ] A ug .1 Templates Spectral templates are crated using the Gamma Detector Response and Analysis Software (GADRAS). Themodeled detector is a 2” × ×
16” NaI(Tl) crystal with 1024 channels in energy. GADRAS generates thesetemplates using historical in-situ data collected in a laboratory setting or, as in the case of geographicallyspecific backgrounds, in the field. The collected data are then convolved with the known detector responsefunctions to produce an asymptotic spectral shape. This shape is asymptotic in the sense that it is modeled as avery long dwell measurement – in our case 24 hours – which is nearly free of channel-wise Poisson noise ∗ . Theseshapes are then scaled to the dwell-time being studied and employed both for ensemble generation and in thenon-adaptive template matching techniques described below.The templates are generated with 1024 channels, spanning the gamma-ray energy range from zero to 3 MeV.The width in energy of each channel is ≈ .
93 keV. The first ten bins are ignored for all of the analysis in orderto approximate low-level energy discriminators often found in fielded detector systems. Thus, the gamma-rayenergy spectra are divided into 1014 energy bins from 29.297 keV to 3 MeV.
500 1000 1500 2000 2500 3000Energy [keV] − −
10 110 ] s e c C oun t s [ k e V AlbuquerqueBaltimoreCharlotteWashington, D.C.DenverIndianapolisLas VegasMiamiPittsburghSalt Lake CityK Maximum U
Maximum Th
Maximum
Figure 1. Modeled background spectral templates. The gamma-ray spectrum typical of 10 cities in the United States.Additionally, we generate background templates where the ratios of each of the naturally occurring radio-isotopes aremaximized with respect to it’s natural range.
In Figure 1, we show the background templates typical of ten different cities in the United States. Addi-tionally, we generate background templates where the ratios of each of the naturally occurring radio-isotopes K, U, and
Th are maximized with respect to the range over which they are commonly found in thenatural environment. This variety in background templates is included in this study in an attempt to modelsome of the variability commonly found in operational or in situ measurements, and thus potentially increasethe robustness of the methods developed and studied. We note, however, that the difference in spectral shapebetween the different background templates are insufficient to fully reflect operational concerns. For example,even the K max background is not as deviant as what might be found after a heavy rain leaches Rn from thesoil. Nine source (i.e. signal) templates are also modeled and are listed in Table 1. The modeled isotopes werechosen to be representative of those commonly used in medical and industrial applications. The modeled activity ∗ Measurement noise is still evident at the highest energies, (cid:38) . − −
10 110 ] s e c C oun t s [ k e V AlbuquerqueTc Ir I Cs Co Ba Eu Th Cu One Meter − −
10 110 ] s e c C oun t s [ k e V AlbuquerqueTc Ir I Cs Co Ba Eu Th Cu Four Meters
Figure 2. Modeled source spectral templates, when combined with the background typical of Albuquerque, NM. Notethat both the vertical axis (Counts [keV − sec − ]) and horizontal axis (Energy [keV − ]) are logarithmically scaled in orderto highlight the low-energy channels where most of the isotopic spectral peaks are present. The top panel shows themodeled detector response with one meter of separation between each source and the detector, and the bottom showsthat with 4 meters of separation. The visible difference between the top and bottom panels illustrates the significanteffect of atmospheric attenuation will have on observed gamma-ray energy spectra. of each isotope is roughly that often found in medicine and industry, but was further adjusted to give a (mostly)clear signal at short distances and medium dwell-times, as shown in the third column of Table 1.The detector response was modeled with the distance between the source and detector d ∈ { , , , , , } meters, and dwell-times t ∈ { . , . , , , , , , , , , } seconds. This range of distances and dwell-times is typical of those found in radiation health and safety applications. The source-to-detector distancenduces significant decrease in signal strength due to gamma-ray attenuation in air. The last two columns ofTable 1 show the effect of attenuation on the signal strength. This attenuation, and the general shape of thesource templates when combined with a typical background, Albuquerque, is plotted in Figure 2 for the distancesone meter and four meters.Isotope N s [sec − ] N s /N b [1 m] N s /N b [4 m] N s /N b [8 m] m Tc 6756.72 1.45 0.18 0.05
Ir 422.314 0.09 0.01 0.003
I 5702.31 1.23 0.15 0.04
Cs 6478.99 1.39 0.15 0.04 Co 8636.7 1.86 0.21 0.05
Ba 8543.61 1.84 0.20 0.05
Eu 8151.16 1.75 0.19 0.05
Th 7596.29 1.63 0.20 0.05 Cu 5657.2 1.22 0.16 0.04
Table 1. The simulated isotopes considered in this study. In the second column we show the number of signal events, N s ,for a dwell-time of one second and at a distance of one meter. In the later columns we show the ratio of expected signalcounts to expected background counts N s /N b at one, four, and eight meters. This table illustrates the total number ofcounts associated with each modeled isotope and the diminishing signal strength at increasing distances. Ensembles of simulated data were generated for the purposes of both training and testing the various algorithmsstudied here. For each combination of 13 backgrounds, 9 isotopes, 11 dwell-time and 6 distances a simulatedspectrum is generated by Poisson sampling the appropriate template. This sampling is repeated 1 × timesfor each of two ensembles which we label “training” and “testing” receptively. Thus each ensemble contains7 . × total simulated spectra. Within the two ensembles, each distance, dwell-time, and isotope combinationcontains 1 . × spectral manifestations.
3. METHODS3.1 Template Matching
Template matching techniques are ubiquitous in gamma-ray spectroscopy and have historically been used sincethe underlying mathematics required to implement them has been available for many decades.
These tech-niques are non-adaptive in the sense that they are designed to choose the model that best represents the datafrom a collection of predefined choices, without prior access to any data; they need not be trained.We can express the PDF of the observed foreground spectrum as a function of gamma-ray energy, E , as alinear combination of the background template and the isotope (i.e. signal) template, f ( E | n b , n s ) = n b f b ( E ) + n s f s ( E ) , (1)where n b and n s are the fit parameters associated with the number of background and signal events, respectively.The functions f b ( E ) and f s ( E ) represent the background and signal templates, respectively, normalized suchthat they each integrate to unity over the range 29.297 keV to 3 MeV.A binned likelihood objective function is constructed, using the standard approach, from the Poissonprobability of observing N i counts in a bin, L ( n b , n s ) = (cid:89) i =1 ( f ( E i | n b , n s )) N i e − f ( E i | n b ,n s ) /N i ! , (2)here E i is the energy of each bin and N i is the observed simulated counts in each bin. For numerical effectivenessthe monotonic property of the logarithm can be exploited, and non-parameter dependent terms dropped, suchthat the actual function to be minimized with respect to the parameters n b and n s isΛ( n b , n s ) ≡ − ln L ( n b , n s ) ≡ (cid:88) i =1 f ( E i | n b , n s ) − N i ln f ( E i | n b , n s ) . (3)If, and only if, the number of observed counts in each bin is large enough one may minimize the χ objectivefunction, χ ( n b , n s ) = (cid:88) i =1 ( N i − f ( E i | n b , n s )) f ( E i | n b , n s ) . (4)For most of the cases under consideration in this work, and indeed most of the cases encountered in radiationand environmental safety, the number of counts in all bins is insufficient to justify the use of this objectivefunction. We include it’s use in this work, however, since this objective function is commonly used in the fieldedapplications and as another metric against which we can compare the newer, adaptive, techniques.Once the objective function(s) is minimized with respect to the free parameters, we choose the isotopetemplate that best represents the data by calculating the Kolmogorov-Smirnov probability for each with respectto the simulated binned data under consideration. We further restrict the choice of best fit template by notallowing the fraction of fit signal events, n s , to be less than 5%, since any signal that is sufficiently close to zerowould be indistinguishable from background. The two adaptive methods explored in this work are dense neural networks (DNN) and convolution neuralnetworks (CNN).An example DNN architecture with two hidden layers is shown in the left panel of Figure 3. In this network,each channel of a gamma-ray spectrum is connected to each node in the next layer. These dense connectionscontinue for each layer in the network.An example of a CNN is shown in Figure 3. The input and output of the CNN are the same as the DNN. Thedifference between the CNN and DNN are the convolution and max pooling layers. Convolution layers activationsare created by convolving 1-D filters with the previous layer’s signal. Max pooling is a sub-sampling operationthat attempt to combine low-level features and to encourage spatial invariance by reducing the resolution of theprevious layers. After the convolution and pooling layers, the features are flattened into a vector and fed intoa fully-connected architecture. The weights of the fully-connected network and the 1-D convolution filters arelearned through training.In general, neural networks have a tendency to memorize their training set in a process called overtraining.An overtrained neural network will tend to incorrectly identify novel data. To prevent this, regularizing hy-perparameters were used in the to prevent overfitting and optimize performance. There is currently no knownmethod to know which hyperparameters have an impact on model performance before training. Because of this,a number of hyperparameters are typically added to a model and a random hyperparameter search is used toidentify those which are important. Ranges of hyperparameters explored for the DNN and CNN are shown in the left and right panels of Table2, respectively. A total of 128 networks were trained with randomly chosen hyperparameters from these ranges.The networks were trained on spectra simulated with a source-detector distance of 4 meters. The performance ofeach network was compared on a separate validation dataset of spectra simulated with the same source-detectordistance. In each of these datasets 10 different spectra were simulated for each isotope and integration time. Thehyperparameter combination that performed best on this dataset was chosen as the final training values. Fromthese values new networks were trained using spectra simulated with source-detector distance of 4 meters. igure 3. Left: An example representation of a dense neural network (DNN) where the input layers are the counts in eachenergy bin of the spectrum, the intervening layers are fully connected, and the output layer is softmax for determiningthe most likely isotope. In the right panel we show a deeper, more complicated convolutional neural network (CNN)where the intervening layers are successive combinations of pooling, one dimensional convolutions, flattening, and finallydropout and fully connected layers. These representations are drawn from Kamuda, et al.
4. RESULTS
There are many factors for evaluating the effectiveness of isotope classification and different end-users will givedifferent weights to each consideration. Out perspective is primarily an operational one; we are interested in notonly class-wise accuracy but also more subtle factors like computational efficiency and false-alarm rate (fall-out).It is well understood that, once properly trained, adaptive techniques are highly computationally efficient todeploy, since they rely on simple operations such as matrix multiplication and functional computation and requireno a posteriori minimization. We found this to be true here: while the temple matching techniques routinelyrequired multiple hours to complete (even when using state of the art software and running parallel-ized on 8CPU cores), the neural networks give results in a matter of seconds.Our primary results, in terms of isotopic class accuracy, are shown in Figures 4 and 5, which illustrate theeffect of increasing dwell-time and distance. We note that in all cases the DNN performs comparably, or better,than the other methods considered. Furthermore, in the low-statistics regimes of short dwell-times and mediumto long distance, the likelihood approach performs notably better than the χ objective function. Based on thisevidence alone it seems reasonable to suggest that currently used software packages might consider switching tousing the log-likelihood objective function.The accuracy as a function of time for a distance of four meters for all methods, and seperated for each isotopeclass, is shown in Figure 6. We can see that even for long dwell-times and at this relatively close distance thenon-adaptive techniques have difficulty accurately selecting Ir. We note that the DNN at this distance failsat selecting background even at long dwell-times; this will be investigated further. The CNN at this distance,however, does performs notably better than the other methods.The spectra simulated at 8 meters most drastically shows the generalization performance of each neuralnetwork. The bottome left panel of Figure 7 shows that the DNN was overfitting to the training dataset,ange FinalValueNumber of Layers 1 - 2 1Number of Neuronsin Each Hidden Layer 16 - 1024 128InitialLearning Rate 10 − - 10 − − L2 RegularizationStrength 10 − - 10 − . − - 10 − − Batch Size 16, 32, 64, 128 64Dense Layers 1 - 3 1Nodes in each Layer 10 - 512 14Dense LayerL2 RegularizationStrength 10 − - 10 − . Table 2. Left: Range of hyperparameter values for the DNN tested in a random search optimization. Right: Range ofhyperparameter values for the CNN tested in a random search optimization. A total of 128 networks were trained withrandomly chosen hyperparameters from these ranges, and the final values were used in for the results of this study. misidentifying most isotopes as
Ir. Because DNN’s do not assume spatial structure (photopeaks, Comptoncontinua) in data, this structure needs to be learned during training. If the training dataset is not diverse enough,the DNN may easily overfit to noise in a signal. This could show an inherent flaw in using an adaptive methodthat does not assume local structure in the data. The top left and right panels of Figure 7 show the non-adaptivemethods commonly misidentifying
Ir, I, Cs, and Cu as background in spectra measured at 8 meters.Because CNN’s do use the signals local spatial structure, they my be more robust when generalizing to newsignals. The bottom right panel of Figure 7 shows that despite the differences between the training dataset, theCNN does generalize well to the spectra measured at 8 meters.Finally, in Figure 8 we show the full confusion matrices for each method at a distance of 8 meters and a dwell-time of 256 seconds. We note that while all methods tend to favor mis-classifying all isotopes as background,the DNN show the most potential for extracting accurate information when the signal strength is subtle, butclearly non-zero. well Time [sec]1 10 A cc u r a cy Algorithm χΛ CNNDNN
Four Meters
Dwell Time [sec]1 10 A cc u r a cy Algorithm χΛ CNNDNN
Eight Meters
Figure 4. The accuracy of each method as a function of dwell-time at a distance of 4 meters (left) and 8 meters (right).
Distance [m]1 2 3 4 5 6 7 8 A cc u r a cy Algorithm χΛ CNNDNN
Distance [m]1 2 3 4 5 6 7 8 A cc u r a cy Algorithm χΛ CNNDNN
32 [sec]
Figure 5. The accuracy of each method as a function of distance for a dwell-time 1 second (left) and 32 seconds (right). w e ll T i m e [ s e c ] Accuracy . . . . . . . . . I s o t ope ∅ T c m I r I C s C o B a E u T h C u w i t h D i s t an c e = [ m ] χ Dw e ll T i m e [ s e c ] Accuracy . . . . . . . . . I s o t ope ∅ T c m I r I C s C o B a E u T h C u w i t h D i s t an c e = [ m ] Λ Dw e ll T i m e [ s e c ] Accuracy . . . . . . . . . I s o t ope ∅ T c m I r I C s C o B a E u T h C u DNN w i t h D i s t an c e = [ m ] Dw e ll T i m e [ s e c ] Accuracy . . . . . . . . . I s o t ope ∅ T c m I r I C s C o B a E u T h C u CNN w i t h D i s t an c e = [ m ] F i g u r e . T h e c l a ss - w i s e a cc u r a c y o f t h e m e t h o d s a s a f un c t i o n o f d w e ll - t i m e a t a d i s t a n c e o f m e t e r s . w e ll T i m e [ s e c ] Accuracy . . . . . . . . . I s o t ope ∅ T c m I r I C s C o B a E u T h C u w i t h D i s t an c e = [ m ] χ Dw e ll T i m e [ s e c ] Accuracy . . . . . . . . . I s o t ope ∅ T c m I r I C s C o B a E u T h C u w i t h D i s t an c e = [ m ] Λ Dw e ll T i m e [ s e c ] Accuracy . . . . . . . . . I s o t ope ∅ T c m I r I C s C o B a E u T h C u DNN w i t h D i s t an c e = [ m ] Dw e ll T i m e [ s e c ] Accuracy . . . . . . . . . I s o t ope ∅ T c m I r I C s C o B a E u T h C u CNN w i t h D i s t an c e = [ m ] F i g u r e . T h e c l a ss - w i s e a cc u r a c y o f t h e m e t h o d s a s a f un c t i o n o f d w e ll - t i m e a t a d i s t a n c e o f m e t e r s . . + . . + . + . + . . . . . + . . . . . T r ue ∅ T c m I r I C s C o B a E u T h C u Chi2 Predicted ∅ T c m I r I C s C o B a E u T h C u Fraction of Ensemble − − − C h i [ m ] [ s e c ] . + . . + . + . + . . . . . + . . . . . T r ue ∅ T c m I r I C s C o B a E u T h C u LogL Predicted ∅ T c m I r I C s C o B a E u T h C u Fraction of Ensemble − − − LogL [ m ] [ s e c ] . . . + . . . + . + . + . . . . + . . + . . T r ue ∅ T c m I r I C s C o B a E u T h C u dnn Predicted ∅ T c m I r I C s C o B a E u T h C u Fraction of Ensemble − − − dnn [ m ] [ s e c ] . . . . . . . + . . . . . . . . . . . . . . . . . . T r ue ∅ T c m I r I C s C o B a E u T h C u cnn Predicted ∅ T c m I r I C s C o B a E u T h C u Fraction of Ensemble − − − c nn [ m ] [ s e c ] F i g u r e . T h e c o n f u s i o n m a t r i x f o r e a c h m e t h o d a t a d i s t a n c e o f m e t e r s a ndd w e ll - t i m e o f s e c o nd s . CKNOWLEDGMENTS
This work was done by Mission Support and Test Services, LLC, under Contract No. de-na0003624 with the U.S.Department of Energy: DOE/NV/03624–0428. This work was also funded by the Consortium for VerificationTechnology under Department of Energy National Nuclear Security Administration award number de-na0002534.
REFERENCES [1] Kamuda, M., Zhao, J., and Huff, K., “A comparison of machine learning methods for automated gamma-rayspectroscopy,”
Nuclear Instruments and Methods in Physics Research, Section A: Accelerators, Spectrome-ters, Detectors and Associated Equipment (1 2018).[2] Sharma, S., Bellinger, C., Japkowicz, N., Berg, R., and Ungar, K., “Anomaly detection in gamma rayspectra: A machine learning perspective,” (July 2012). IEEE Xplore: 31 August 2012.[3] Ford, W. P., Hague, E., McCullough, T., Moore, E. T., and Turk, J., “Threat determination for radiationdetection from the remote sensing laboratory,”
Proceedings of the SPIE , 106440G (2018).[4] Moore, E. T. et al., “Analysis of gamma-ray spectrum using transfer learning,” https://arxiv.org/ (2019). in DRAFT .[5] Moore, E. T., Ford, W. P., Hague, E. J., and Turk, J. L., “Algorithm development for targeted isotopics,”(2018).[6] Moore, E. T., Ford, W. P., Hague, E. J., and Turk, J. L., “Algorithm development for targeted isotopics,”
Site-Directed Research and Development - FY 2018 (2019).[7] Hague, E. J., Ford, W. P., McCullough, T., Moore, E. T., and Turk, J., “Machine learning for gammaspectra,”
Symposium on Radiation Measurements and Applications
Ann Arbor , Michigan (June 13, 2018).[8] Steve M. Horne, et al., “GADRAS-DRF 18.5 users manual.” Accessed: March 2019.[9] Rene Brun, et al., “CERN ROOT data analysis framework.” Accessed: March 2019.[10] Wouter Verkerke and David Kirkby, “The RooFit toolkit for data modeling.” Accessed: March 2019.[11] Michiel Hazewinkel, “Maximum-likelihood method.” Accessed: March 2019.[12] Scherer, D., M¨uller, A., and Behnke, S., “Evaluation of pooling operations in convolutional architecturesfor object recognition,” in [
Artificial Neural Networks – ICANN 2010 ], Diamantaras, K., Duch, W., andIliadis, L. S., eds., 92–101, Springer Berlin Heidelberg, Berlin, Heidelberg (2010).[13] Bergstra, J. and Bengio, Y., “Random search for hyper-parameter optimization,”
Journal of Machine Learn-ing Research13