[PDF] Chronological age estimation of lateral cephalometric radiographs with deep learning

Abstract

The traditional manual age estimation method is crucial labor based on many kinds of the X-Ray image. Some current studies have shown that lateral cephalometric(LC) images can be used to estimate age. However, these methods are based on manually measuring some image features and making age estimates based on experience or scoring. Therefore, these methods are time-consuming and labor-intensive, and the effect will be affected by subjective opinions. In this work, we propose a saliency map-enhanced age estimation method, which can automatically perform age estimation based on LC images. Meanwhile, it can also show the importance of each region in the image for age estimation, which undoubtedly increases the method's Interpretability. Our method was tested on 3014 LC images from 4 to 40 years old. The MEA of the experimental result is 1.250, which is less than the result of the state-of-the-art benchmark because it performs significantly better in the age group with fewer data. Besides, our model is trained in each area with a high contribution to age estimation in LC images, so the effect of these different areas on the age estimation task was verified. Consequently, we conclude that the proposed saliency map enhancements chronological age estimation method of lateral cephalometric radiographs can work well in chronological age estimation task, especially when the amount of data is small. Besides, compared with traditional deep learning, our method is also interpretable.

Full PDF

IIEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. XX, XXXX 2017 1

Chronological Age Estimation of LateralCephalometric Radiographs with Deep Learning

Ningtao Liu

Abstract — Traditional manual age estimation method iscrucial labour based on many kinds of X-Ray image. Somecurrent studies have shown that lateral cephalometric(LC)images can be used to estimate age. However, these meth-ods are based on manually measuring some image featuresand making age estimates based on experience or scoring.Therefore, these methods are time-consuming and labor-intensive, and the effect will be affected by subjective opin-ions. In this work, we propose a saliency map-enhancedage estimation method, which can automatically performage estimation based on LC images. Meanwhile it canalso show the importance of each region in the image forage estimation, which undoubtedly increases the method’sInterpretability. Our method was tested on 3014 LC imagesfrom 4 to 40 years old. The MEA of the experimental resultis 1.250, which is less than the result of the state-of-the-artbenchmark, because it performs signiﬁcantly better in theage group with less data. In addition, our model is trainedin each area with high contribution to age estimation in LCimages, so the effect of these different areas on the ageestimation task were veriﬁed. Consequently, we concludethat the proposed saliency map enhancemented chronolog-ical age estimation method of lateral cephalometric radio-graphs can work well in chronological age estimation task,especially when the amount of data is small. In addition,compared with traditional deep learning, our method is alsointerpretable.

Index Terms

I. I

NTRODUCTION

Age is deﬁned as “the length of time that a person haslived or a thing has existed”, it is one of the importantfactors in determining a person’s identity. Age estimation iswidely used in forensic human identiﬁcation and criminaland civil proceedings, such as: identiﬁcation of victims aftermass disasters, juvenile delinquency, migrants without identitypapers, retirement beneﬁts, etc [1].Many parts of the human body can be used for ageestimation due to aging changes. During the course of ahuman’s skeletal maturation and degeneration, age-relatedmorphological changes take place. Age can be estimated fromthe evaluation of the size, shape and degree of epiphysealossiﬁcation of bones. Bones from various parts of the body

T. C. Author is with Key Laboratory of Intelligent Perceptionand Image Understanding of Ministry of Education, School of Ar-tiﬁcial Intelligence, Xidian University, Xi’an 710071, China (e-mail:nt [email protected]). are used to study bone age estimate, such as hand-wrist [2],knee [3], foot [4], clavicle [5], ilium and pubis [6]. Mostof the bones development are completed in their 20s, Thereare few studies on the relationship between age-related bonechanges and age in adults, except for the pubic bone. Thesigniﬁcant variation of pubic symphysis morphology relatedto bone degeneration makes the accuracy of age estimationlower, especially over 40 years of age [7]. The pubis can onlybe used for corpse to estimate age, which limits its use.Teeth have the advantage of being preserved for a longtime after the disintegration of other tissues, even bones, andunlike bones, they can be clinically inspected directly in theliving individuals. Every tooth has a unique set of featuressuch as shape, pathology, wear pattern, color and locationand the arrangement of teeth in different mouths varies fromperson to person which forms the basis of identiﬁcation.Tooth and dental characteristics are considered to be one ofthe most valuable personalized characteristics of the humanbody, which provide very persuasive evidence for human bodyidentiﬁcation. The principal basis of dental identiﬁcation is thatno two mouths are alike and each person’s teeth are unique[8]. These beneﬁts make teeth become the preferred organ forforensic age estimation and recognition [9].Many methods of estimate age from teeth have been estab-lished, these are divided into four categories, clinical/visual,morphological, radiologic and biochemical methods based ondegradation process observed in tooth structure [10] [11] Toothdevelopment and eruption sequence have been widely usedin the age estimation of children and adolescents with highaccuracy (standard error ± a r X i v : . [ ee ss . I V ] J a n IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. XX, XXXX 2017 methods are not suitable for estimating the age of living personfor ethical, religious, cultural or scientiﬁc reasons.Dental X-ray radiographs is one of the main clinical diag-nostic tools. The most commonly used in dental clinics areintraoral periapical radiographs, dental panoramic radiographsand lateral cephalograms. For children and adolescents, thetime of the emergency of the tooth in the oral cavity andthe tooth calciﬁcation are basically observed in radiographs.Demirjian introduced the most widely used method, which de-ﬁned 8 stages of crown and root development on 7 permanentteeth [13]. Cameriere innovatively assessed age in childrenbased on the correlation between age and measurement ofopen apices in teeth and obtained good accuracy [35]. Inadulthood when all permanent teeth have been completelyformed, radiographic age estimation becomes difﬁcult. Theaccuracy of measuring the secondary dentin on dental X-ray radiographs to estimate age needs to be improved [36][37]. These manual age estimation methods are boring, time-consuming and subjective.Gender determination is crucial for identiﬁcation becausethe number of possible matches is reduced by half. Genderdetermination mainly base on sexual dimorphism. Sexualdimorphism refers to the difference in appearance, size andstructure between male and female at the same age. The pelvisand skull are traditional gender indicators, and their successrate in gender identiﬁcation is approximately 100% [38]. Theaccuracy of traditional methods to infer gender by measuringteeth can not meet the requirement of forensic identiﬁcation[39] [40]. These manual age and gender estimation meth-ods are boring, time-consuming and subjective. Therefore, amethod for automatically estimating age and gender is neededto improve the accuracy and repeatability.Deep learning can analyze medical images intelligent, pre-cise and quickly, it has a profound impact on various ﬁeldsof medicine. Dou proposed an unsupervised domain adapta-tion framework with adversarial learning for cross-modalitymedical image segmentations and achieved good results oncardiac segmentations [41]. Hannun developed a deep neuralnetwork (DNN) to classify 12 rhythm classes and provedthat an end-to-end deep learning approach can reach highdiagnostic performance comparable that of cardiologists [42].Pratt presented a method for identifying the learning featuresof a CNN and applied it in the severity diagnosis of DRin fundus images, which provide a useful tool to determinethe relation between deep learning classiﬁcation models andclinical diagnosis procedures [43]. At present, there are fewstudies on inferring age and gender through dental X-rayimages deep learning. All of these studies are based onpanoramic radiographs [44] [45] [46] [47] [48] [49].Lateral cephalogram and orthopantomogram (OPG) radio-graphs are routinely taken for each orthodontic patient fordiagnostic and treatment planning purpose. Compared withpanoramic radiograph, lateral cephalogram contains the entirecraniofacial bones and soft tissue. Because of the way thelateral cephalogram were taken, the left and right cranio-facial bones and teeth overlapped together, so the lateralcephalogram can provide more information than panoramicradiograph. However, no one has done any research to infer (a) age less than 0 (b) incomplete(c) with restoration (d) wrong location

Fig. 1 : Examples of some typical unqualiﬁed imagesage and gender based on deep learning of lateral cephalogram.In this paper, LC images are used for the ﬁrst time infully automatic age estimation. We also propose a novel deeplearning approach to overcome the issue of insensitivity tochanges in samples after adulthood when the aforementionedmethods and images are applied to age estimation. Our ap-proach can not only obtain accurate age estimates efﬁcientlyand conveniently, but also has strong interpretability, whichcan verify the experience and age estimation rules in clinical.

II. MATERIALS AND METHODS

A. Data Set

We obtained a dataset contain 20174 LC (female and male)from the database of the Stomatological Hospital of Xi’anJiaotong University Health Science Center, China, the age ofthese image ranges from 4 to 40-year-old. All the subjectswere divided into 4 groups by 5-year age. The subjects’ ageand gender distribution are shown in Table. I. Images withobvious age errors, wrong imaging locations, and poor imagequality in the dataset are excluded. The accuracy of the agelabel is guaranteed by the ID card information. The age ofthe subject was calculated by subtracting the photo date fromthe date of birth and dividing by . (due to leap years)and rounding to the nearest hundredth. The bit depth of theimages in the dataset is 16 bpp. The size of most images is × . B. Base Network

After AlexNet demonstrated its ability in natural imageprocessing in the ImageNet competition in 2012 [50], in recentyears, deep learning algorithms, in particular convolutional

UTHOR et al. : PREPARATION OF PAPERS FOR IEEE TRANSACTIONS AND JOURNALS (FEBRUARY 2017) 3

CNN FCEfficientNet w w n w − n w + RELU CNN FCCNN FC AgeAgeAgeAgeAgeStep1Step2Step3-trainingFCCNN ParameterParameter EfficientNetCNN FCInputInputHeat-Map Y N ConcatCopyCopyConcat Y N prediction > label > EfficientNetEfficientNet Step3-testing

Fig. 2 : Step1, Train the Efﬁcient-B0 network, Setp2, Generate each sample’s saliency map using the Efﬁcint network trained inStep1. Step3-training, in the training phase, because the age label of the LC image is available, if the sample is not more than25 years old, the LC image and its copy were used as the input of the network; otherwise, the LC image and its correspondingsaliency map are used as the input of the network. Step3-testing, In the testing phase, age labels are not available, and inputto the network cannot be determined directly based on age. Therefore, we ﬁrst used the LC image and its copy as the inputof the network. If the estimated age of the network is greater than 25 years old, the LC image and its corresponding saliencymap were used as the input of the network to estimate the age again, and the latter is used as the ﬁnal estimated age of thenetwork; otherwise, the output of the network is directly used as the ﬁnal estimated age without retesting.

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. XX, XXXX 2017

TABLE I : The distribution of the data set

Age(year) Total Data Set Gender OrientationTrain Val Test Male Female Left Right4-10 2599 1822 393 384 1264 1335 2033 56611-15 7652 5354 1148 1150 3072 4580 5395 225716-20 4591 3211 690 690 1713 2878 3326 126521-15 3044 2137 454 453 815 2229 2282 76226-30 1452 1020 210 222 297 1155 1125 32731-35 577 408 86 83 102 475 468 10936-40 259 190 37 32 39 220 203 56All 20174 14142 3018 3014 7302 12872 14832 5342 networks, have rapidly become a methodology of choice foranalyzing medical images, whos application range is verywide, including different tasks (e.g. segmentation [51], classi-ﬁcation [52] [53], object detection [54], regression [49] [55]),different organs (e.g. head&neck [56], lung [57], bone [58]).Compared with other methods, the most signiﬁcant featuresof CNNs are shift invariant and space invariant, because theweights in the network are shared by the network performingconvolution operations on the image. In this way, the modeldoes not need to learn separate detectors for the same objectoccurring in different positions in an image. A convolutionalneural network usually consists of an input and an output layer,as well as multiple hidden layers. The hidden layers of a CNNtypically consist of a series of convolutional layers. At eachconvolutional layer l , the input image is convolved with aset of kernels with weight W = { W , W , . . . , W k } andadded biases B = { b , b , . . . , b k } , each generating a newfeature map X k . The outputs of convolution are subjectedto an element-wise no-linear activation function σ ( · ) likeRectiﬁed Linear Unit(ReLU), σ ( x ) = x + = max (0 , x ) ,which is subsequently followed by a downsampling operationsuch as pooling layer. The same process is repeated for everyconvolutional layer: X lk = σ (cid:0) W l − k ∗ X l − + b l − k (cid:1) (1) The parameter sharing of CNN also drastically reduces theamount of parameters (i.e. the number of weights no longerdepends on the size of the input image) that need to be learned[59].The hidden layers are considered as a feature extractor, itis usually followed by a fully connected layer, which takesthe feature maps extracted by the hidden layers as input andgenerate the result.Backpropagation [60](BP) is a widely used algorithm fortraining feedforward neural networks. In ﬁtting CNNs, back-propagation computes the gradient of the loss function w.r.t theweights and biases of the network for a single input–outputexample.Efﬁcient [61] is a model that scales the depth, widthand resolution dimensions of the CNN collaboratively anduniformly under the condition of ﬁxed computing resources. InEfﬁcientNet a compound scaling method was proposed, whichuse a compound coefﬁcient φ to uniformly scales network TABLE II : The structure of Efﬁcient-B0

Stage i Operator ˆ F i Resolution ˆ H i × ˆ W i ˆ C i ˆ L i × ×

32 12 MBConv1,k × ×

16 13 MBConv6,k × ×

24 24 MBConv6,k × ×

24 25 MBConv6,k × ×

80 36 MBConv6,k × ×

112 37 MBConv6,k × ×

192 48 MBConv6,k × ×

320 19 Conv × & Pooling& FC × width, depth, and resolution in a principled way:depth: d = α φ width: w = β φ resolution: r = γ φ s.t. α · β · γ ≈ α ≥ , β ≥ , γ ≥ where α , β , β are constants that can be determined by a smallgrid search. Intuitively, φ is a coefﬁcient describe how manymore resources are available for model scaling, while α , β , γ specify how to assign these extra resources to network width,depth, and resolution respectively. In this paper, Efﬁcient-B0is used as our base network. The main block of Efﬁcient-B0is mobile inverted bottleneck MBConv [62] [63], to which thesqueeze-and-excitation optimization [64] is also added. Thestructure of Efﬁcient-B0 is shown in Table II. C. Saliency Map

Compared with traditional methods, CNNs have signiﬁcantadvantages in addressing medical and health care problems.But this also brings some serious challenges, the most impor-tant of which is model interpretability and explainability [65],especially in medical and health care. The visualization of theintermediate process and results of the model is common toimprove the interpretability of the model.In this paper, we used Grad-CAM [66] as visualizationapproach without modifying the structure of base network. inwhich, the global average of the gradient of the feature mapis calculated as its weight. The weight of the k − th featuremap input to the fully connected layer in the base network tothe output α k = 1 Z (cid:88) i (cid:88) j ∂ ˆ y∂A kij (3) where, Z is the feature map, ˆ y is the output of network, and A kij is the value of pixel ( i, j ) in the k − th feature map.saliency map M is obtained by calculating the weighted sumof the feature map: M = ReLU ( (cid:88) k α k A k ) (4) The process of generating saliency map is shown in the Step2of Fig.2

UTHOR et al. : PREPARATION OF PAPERS FOR IEEE TRANSACTIONS AND JOURNALS (FEBRUARY 2017) 5

D. Age Estimation with Saliency Map

Sample imbalance is common in many tasks. In the LCimages age estamation task, the distribution of the count ofsamples of each chronological age varies greatly, which isshown in Table. I. This imbalance is also reﬂected in theperformance of using the base model to estimate the age ofLC images of each chronological age.In order to alleviate the problem of data imbalance andimprove the overall performance of the model, we haveconsidered how to apply the knowledge learned by the modelon all training data, especially in the age ranges with samallcount of samples, to the samples in those age groups withlarge number of samples. Inspired by the positioning abilityshown when CAM is applied to Weakly-supervised ObjectLocalization and Fine-grained Recognition in [67], we believethat the saliency map generated by Grad-CAM based on thetrained model contains knowledge from all training samples,which is attention for samples of age ranges with samall countof samples.In addition, in practical applications, the interpretability ofthe method is crucial. Users need to know which parts of theLC image are derived from the age estimation results and thecontribution of each part. saliency map can well describe thecontribution of each area in the image to the output result andvisualize it, which is of great signiﬁcance to our understandingof the working mechanism of the network. Therefore, in thispaper, the saliency map of each age sample was analyzed, andwhether it is consistent with the proven age-related changeswas also veriﬁed.We compared the performance of models that use LCimages as input and LC images and corresponding saliencymap generated by Grad-CAM as input. We observed thatalthough the overall performance of the model trained withsaliency map on the test set is not as good as the model trainedwith LC Images alone, the former performs signiﬁcantly betterthan the latter in those age groups with sparse samples, whichis shown in Fig.5.As we have observed, between the ages of 4 and 25, whenonly LC images are used as the input to the network, the MAEis less than that of using both LC images and correspondingsaliency map as input, while between the ages of 26 and30, which is also the age range that traditional method canhardly estimate the age (add a cite), using both of LC imagesand corresponding saliency map as input of network performsbetter.Therefore, in the train, we simply copy the LC image asinput for samples from 4 to 25 years old, while for the samplesfrom 26 to 40 years old, we concated the LC image andcorresponding saliency map as input, because of the age labelis available.However, in the test, because the age label is not available,we applied a different strategy, that is, ﬁrst just copy the LCimage as input to get the predicted age. If the predicted ageis greater than 25, then concat the LC image and correspond-ingsaliency map as the input of the network, test again, andthe result of the retest is regarded as the ﬁnal estimated age,otherwise, the result of the ﬁrst test is directly used as the ﬁnalestimated age. We call this strategy retest

E. Regional comparison

In the practice of forensic, it is very common that all partsof the LC image cannot be obtained. Therefore, it is necessaryto measure the effect of each area in the LC image on the ageestimation. First, calculate the mean saliency map for each ageas follow: M i = 1 N i N i (cid:88) n =1 m n (5) where M i is the mean saliency map of age i , i ∈{ , , . . . , , } , N i is the count of samples with age i intraining set, , m n is the saliency map of n th sample and (cid:80) means element-wise summation. As the mean saliency mapshown in 11, we can ﬁnd that the more signiﬁcant parts inthe LC image are mainly concentrated in three parts. EachLC image was divided into these three overlapping parts withthe same strategy, because when the LC image is imaged, theposition and posture of the subject relative to the shootingdevice are almost ﬁxed. As shown in Fig. 3, for a speciﬁcLC image, set its width and height as W and H respectively,let the upper left corner of the LC image as the origin ofthe coordinates, right and down as the positive direction ofthe x-axis and y-axis, and the width of one pixel as theunit length, establish a coordinate system, the coordinatesof the upper left and lower right corners of the rectangleof the skull are (0 , and ( W , H + 100) respectively. Ifthe orientation is left, the coordinates of the upper left andlower right corners of the rectangle of the tooth part are (0 , H − , ( W + 100 , H ) , the upper left and lower rightcorners of the rectangle of the spine part The coordinates are ( W − , H − , (( W , H ) ; If the orientation is right,each part is mirror-symmetrical to those of the image facingthe left.In order to compare their performance in the age estimationand provide guidance for forensic practice, each part of thesample was used as the input of the model. In order to analyzethe contribution of each region in the local image to the ageestimation more accurately and verify the reliability of thesaliency area of the saliency map of the entire image, thesaliency maps of each local image were also generated. TABLE III : Take the upper left corner of the LC image facingleft as the origin of the coordinates, right and down are thepositive directions of the x-axis and the y-axis, and the widthof a single pixel as the unit length to establish a coordinatesystem, the coordinates ( x, y ) of the upper left and lower rightcorners of each part. The boxes of the LC image facing theright are mirror-symmetrical with those of LC image facingthe left. POS.:Position; ORI.:Orientation PAR.POS. Upper Left Lower RightA (0 , H − W + 100 , H ) B (0 ,

0) ( W , H + 100) C ( W − , H − W , H ) IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. XX, XXXX 2017 (0,0) )/2( +100 ，   ( ) ，   )/ 3 1 ,( 00 +   )/2( , +100     )/ 3 100,( / 2 100 − −   Fig. 3 : The green, red and orange box are the boundaries ofA, B and C respectively. The coordinates of the upper leftand lower right corners of each box are also displayed incorresponding colors

EfficientNet

AGEGENDER σ Sigmoid

Fig. 4 : The output of EfﬁcientNet=B0 includes age and gender,and the gender output needs to be mapped to a value between0 and 1 through the Sigmoid function. If this value is closerto 0, the gender of the input image is considered to be male,and if it is closer to 1, it is considered to be female (becausethe label value of male is 0 and female is 1)

F. Age Estimation with Gender Classiﬁcation

Due to differences in male and female development, LCimages of the same age are bound to be different, which isespecially obvious before adulthood. The age labels are thesame, but the characteristics of the images are different, orthe characteristics are similar, but the age labels are different.This inconsistency caused by gender will inevitably make theneural network optimize in different directions. Therefore,adding a gender estimation task to age estimation enablesthe neural network to distinguish between genders, therebyachieving better age estimation performance. At the same time,gender classiﬁcation is also crucial for forensic practice. sowe modiﬁed the output and loss function of the Efﬁcient-B0model so that the model has the ability to estimate age andclassify gender at the same time, which is shown in Fig. 4.

III. EXPERIMENTS AND PERFORMANCEEVALUATION

The samples of each age are split into training set, validationset and test set according to the ratio of . . , whichmakes each set contain samples of each age proportionally,thus avoiding the distribution ratio of samples in each age isinconsistent with the overall distribution ratio.The weight of the model are randomly initialized with anormal distribution. In this paper, the model are trained usingmini-batch, that is, in each training iteration, batch − size samples are input to the model, and the error of the predictedvalue is calculated by the loss function. In our approach, theloss functions for age estimation and gender classiﬁcation are L age and L sex respectively, which are deﬁned as follows: L age = 1 n n (cid:88) i =1 | ˆ y a − y a | (6) L gender = (cid:118)(cid:117)(cid:117)(cid:116) n n (cid:88) i =1 (ˆ y g − y g ) (7) where n is the number of samples in a batch (also calledbatch size), y a and y g are the age label and gender label ofLC image, respectively, ˆ y a and ˆ y g are the age prediction andgender prediction of network, respectively,.The loss function will be propagated back through thenetwork by BP mentioned in II, and the gradient of eachparameter in the network w.r.t the loss of the current trainingiteration is calculated. The optimizer used in our work isAdam [68], and its weight decay is set to . . Differentfrom the traditional stochastic gradient descent, Adam designsindependent adaptive learning rates for different parametersby calculating the ﬁrst and second moment estimates of thegradient.In order to prevent the model from converging to the localoptimum and speed up the convergence speed, the learningrate will also decrease with the number of times the networktraverse the whole dataset (also called epoch). Suppose theinitial learning rate is η , we get the learning rate of the currentepoch. η i = η × . (cid:98) epoch/ (cid:99) (8) where the initial learning rate η = 0 . and (cid:98)·(cid:99) means ﬂooroperation. Each parameter will be updated according to itsgradient and current training iteration learning rate: w = w − η i (cid:53) w (6) where w is the parameter of model and (cid:53) w is the gradient ofparameter w w.r.t the loss. The process of parameter updateis also called gradient descent .In this paper, several typical CNNs that perform well onnatural images were tested using the same data set. For everynetwork, when the performance on validation set does notimprove for three consecutive epochs or the epoch reaches themax epoch, which was set at in this paper, the trainingstops. After training, select the parameters of the network thatperforms best on the validation set and test on the test set toget the ﬁnal performance of the model. UTHOR et al. : PREPARATION OF PAPERS FOR IEEE TRANSACTIONS AND JOURNALS (FEBRUARY 2017) 7

For a speciﬁc Convolutional Neural Network (CNN) witha Fully Connected layer(FC layer) as a classiﬁer, since thefeature map is ﬂattened into a one-dimensional vector beforebeing input to the fully connected layer, and the input sizeof the fully connected layer is ﬁxed, this requires The sizeof the image input to a convolutional neural network must beﬁxed(The Global Average Pooling(GAP) proposed in [69] canfree CNNs from this limitation). Meanwhile, considering thepreservation of the texture and structure information in theimage and the efﬁciency of network, the images in the dataset are resized to × . However, the aspect ratio ofthe samples in the data set is not , directly resizing to × will inevitably cause image distortion. Therefore,before resizing the short side of every LC image in dataset waspadded to the length of its long side, so that the aspect ratioof the image is . Finally, the LC images are normalizedto [0 , .The samples in the training set have been augmented toincrease the diversity of the samples, thereby improving thegeneralization performance of the model. The training set areaugmented by random afﬁne transformation, horizontal ﬂip,and vertical ﬂip.In order to compare the performance of these modelscomprehensively, several metrics that are meaningful to practicare calculated. The error of a individual sample(E) is the trueage of the sample minus the predicted age of the sample bythe network. The absolute error (AE) of a individual sampleis the absolute value of its error. The median of the errorsof all samples is E.Med, and the median of the absoluteerrors is AE.Med. The average( µ ), standard deviation( σ ) andinterquartile range (IQR) of AE are also calculated, The meanand median are used to measure the overall age-estamationperformance of the network, while the stability and availabil-ity of the age-estamation performance are measured by thestandard deviation and IQR.There are also several metrics calculated to evaluate theperformance of the model on gender-classiﬁcation. The outputof the neural network on gender classiﬁcation is a continuousvalue of 0-1. In this paper, the threshold was set to . . Ifthe output value of network was less than the threshold, thepredicted result was regarded as male, otherwise, the predictedresult was regarded as female (if the gender of the sample ismale, the gender label is coded as 0, otherwise it is coded as1). Accuracy is the ratio of the predicted result is consistentwith the label, which is used to assess whether the networkcan classify the two categories well. However, the calculationof accuracy needs to rely on a speciﬁc thresholds, which is notobjective and incomplete for network performance evaluation.Therefore, another metric, ROC-AUC, which is widely usedto assess the robustness of classiﬁcation models, was alsocalculated. When a threshold is set, a set of sensitivity andspeciﬁcitiy will be calculated, and a point will be obtained onthe plane with sensitivity and speciﬁcitiy as the horizontal andvertical coordinates. ROC is a curve obtained by connectingall of these points, and the area under the curve is ROC-AUC.In order to select a basic network suitable for age estimationand gender classiﬁcation of LC images several typical CNNsthat perform well on natural images are tested using our data set, because natural images and medical images have some ofthe same low-level features, such as texture and edges, etc. Asshown in Table. I, the EfﬁcientNet-B0 performed much betterthan other CNNs, so it was chosen as our basic network inthis work.After the basic model was trained, saliency map is generatedto observe the signiﬁcance of each part in the LC image forage-estimation, which are shown in Fig.2.By comparing the age estimation performance on test setof the basic network with or without Grad-CAM, we built adecision block to determine at which ages need to connect toGrad-CAM and which ages only need to copy LC images,which are shown in Fig. 2. Based on this decision block,we trained Efﬁcient-B0 again. The difference is that duringtraining, the decision block can determine whether the inputof the network is the concatenation of LC image and Grad-CAM or the LC image and its copy, while during testing, forall ages, the LC image and its Grad-CAM are concatenated asthe input of the network.In order to assess the contribution of each salient part in theLC image, we divided the LC image into three parts as shownin Fig. 3. A, B, and C part of LC image were used as theinput of the network respectively to assess the performance ofevery single part for age estimation. After that, the A, B, andC parts of the LC image were combined in pairs as the inputof network to assess the performance of age estimation wheneach area of the LC image is missing. The split of data set,training parameters and network remain unchanged.After the contributions of Grad-CAM and each part ofLC image to age estimation were assessed, we assessed theinﬂuence of gender classiﬁcation on age estimation. When thenetwork performs age estimation and gender classiﬁcation atthe same time, instead of simply adding L age and L gender toget the overall loss L (Eq. 7), L age and L gender are multipliedby the factor α and − α respectively before the addition,because the scales of L age and L gender are different, whichcan avoid the network tending to optimize L age . The factor α was set to . in this paper, which is the ratio of the rangeof gender label values to the range of age label values. L = α L age + (1 − α ) L gender (9) IV. RESULT

As shown in Table. IV, for age estimation, Efﬁcient-B0performs far better than other networks in terms the averageand dispersion of the prediction error except in the − age range, and requires less memory and computation. Theoverall mean absolute error(MAE) and standard deviation ofthe absolute error(SD) estimated by EfﬁcientNet-B0 at allages are . (years) and . (years), respectively. The basicnetwork performs best and worst in − years old and − years old, respectively, which is highly correlatedwith the sample size of these two age ranges. This is why wechosen EfﬁcientNet-B0 as the basic network for our data set.However, it can be seen from the results that the performanceand stability of EfﬁcientNet-B0 were not the best among theCNNs compared to it, for the samples aged from 26 to 40. IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. XX, XXXX 2017

TABLE IV : Comparison of CNN model performance. µ ± σ : Mean ± Standard deviation; AE.Med.: Median absolute error; IQR:Interquartile range. Theerror metrics are given in years.

Age 4-10 11-15 16-20 21-25 26-30 31-35 36-40 ALLRes18 µ ± σ ± ± ± ± ± ± ± ± µ ± σ ± ± ± ± ± ± ± ± µ ± σ ± ± ± ± ± ± ± ± µ ± σ ± ± ± ± ± ± ± ± µ ± σ ± ± ± ± ± ± ± ± µ ± σ ± ± ± ± ± ± ± ± µ ± σ ± ± ± ± ± ± ± ± µ ± σ ± ±

12 1.82 ± ± ± ± ± ± µ ± σ ± ± ± ± ± ± ± ± Absolute Error ( m–s ) A g e R a n g e ( y e a r s ) E f f i c i e n t N e t - B 0 E f f i c i e n t N e t - B 0 + G r a d - C A M

Fig. 5 : Comparison of the performance of the basic networkand the basic network constrained by the saliency graph The performance of Grad-CAM being added to the inputof the network is shown in Fig. 5. It can be seen that afteradding Grad-CAM, compared to using only LC images asinput, the MAE of our method on samples from 26 to 30 yearsold was signiﬁcantly reduced. More importantly, the additionof Grad-CAM reduced the overall SD, which means that theperformance of the network is more stable, and the result ismore reliable.As shown in 6 and V, after the retest mechanism is added,the input of the network is different for samples of differentages, so the advantages of using Grad-CAM and not usingGrad-CAM are combined. Therefore, whether it is 4 to 25years old or 26 to 40 samples, the MAE was not worse than theMAE that only uses LC images as input, and the overall MAEand SD was also reduced. Compared with the better MAE, theoptimization of SD is more worthy of attention, because whenthe mean value of absolute error is similar, the smaller standarddeviation means that the performance of the network is morestable, which is of great signiﬁcance in forensic practice.The MAE and SD using only one part of the LC image

UTHOR et al. : PREPARATION OF PAPERS FOR IEEE TRANSACTIONS AND JOURNALS (FEBRUARY 2017) 9 - 303691 21 51 82 1

Absolute Error ( m–s ) A g e R a n g e ( y e a r s ) E s t i m a t i o n E s t i m a t i o n + G r a d - C A M E s t i m a t i o n + G r a d - C A M + R e t e s t

Fig. 6 : Performance comparison of the basic network, thenetwork using saliency graph constraints and the networkusing saliency constraints and applying the retest mechanismor one part of the LC image missing for age estimation arecompared in Fig 7. As can be seen in Fig. 7, the performanceof using only part A for age estimation is signiﬁcantly betterthan using only one of the other two parts. Naturally, if part Awas missing, using the remaining two parts for age estimationhad the worst performance. However, the effrct of using onlya certain part or missing a certain part on age estimation isnot very signiﬁcant, which shows that in forensic practice,only part of the anatomical structure can still be used for ageestimation and is still credible.As shown in Table V, EfﬁcientNet-B0 can be used for ageestimation tasks and gender classiﬁcation tasks simultaneously.Moreover, due to the introduction of gender information,although only LC images are used as the input of the network,the performance of age estimation was better than othermethods. The MAE and SD of age estimation with genderclassiﬁcation are . and . respectively. The performancecomparison of age estimation with gender classiﬁcation andthe basic method is shown in Fig.8. The ROC for genderclassiﬁcation is shown in Fig. 9. The accuracy and ROC-AUCof gender classiﬁcation are . and . respectively. V. DISCUSSION

In recent years, due to the availability of medical datasetsand the improvement of computer processing power, theapplication of deep learning algorithms in the ﬁeld of computervision has surged, and it has quickly become the preferredmethod for analyzing medical images. Convolutional neuralnetworks have irreplaceable advantages in the ﬁeld of imageprocessing. Its parameter sharing and local receptive ﬁeldsallow the extraction of image features with fewer parameterswithout losing the spatial information of the image. Therefore,the convolutional neural network plays an irreplaceable role inthe ﬁeld of medical image processing and is one of the bestdeep learning models for researchers.Deep learning has also brought about tremendous changesin age estimation research. Deep learning has incomparable advantages over traditional methods: 1) It avoids the inﬂuenceof subjective factors on age estimation; 2) The accuracy andefﬁciency are higher than traditional manual methods; 3) It isnot labor-intensive.Because the lateral cephalogram contains the cervical spine,all teeth and craniofacial bones, as well as the craniofacialsoft tissue. Therefore, it can simultaneously reﬂect the agingchanges of these areas. So far, the study of age estimationusing lateral radiographs is still a blank, so we choose lateralradiographs as the research object.By comparing MAEs and SDs for each age group calculatedby the four methods in this study(as shown in V), we foundthat the MAE of the basic network is less than 1 year old whenthe age is less than 15 years old. This shows that the accuracyof the basic network is very high under the age of 15. With theincrease of age, the MAE of basic network increases. Becausethe traditional methods using dental images can not accuratelyinfer the age of over 25 years old, and the number of LCimages of over 25 years old group is smaller than under 25 inour dataset, we calculated the MAEs of 4-25 age group and26-40 age group respectively. The performance of the basicnetwork is far superior to traditional methods, both in termsof the accuracy and efﬁciency of age estimation.Grad-CAM can well describe the contribution of each areain the images to the output result and visualize it, so thatwe know which parts of the LC image are derived from theage estimation results and the contribution of each part. Theinterpretability of this method is crucial, which is also of greatsigniﬁcance for us to understand the working mechanism ofthe network. When using EfﬁcientNet-B0 with Grad-CAM toinfer age, the MAE of the 26-40 age group is signiﬁcantlylower than that of the basic network, with a decrease of . ,which indicates that Grad-CAM can improve the accuracy ofage inference for the age group with small sample size.The retest method combines the advantages of the formertwo methods. Compared with the basic network, the MAEof the 26-40 age group is signiﬁcantly reduced( . reduc-tion), and compared with the efﬁcient net-b0 with Grad-CAMmethod, the MAE of the 4-25 age group is also signiﬁcantlyreduced( reduction), The SDs of 4-25 and 26-40 agegroups obtained by retset method are all smaller than that ofthe former two methods, which reﬂects the stability of the ageinference efﬁciency of retest method is higher than that of theformer two methods.When using EfﬁcientNet-B0 for age estimation and gen-der classiﬁcation at the same time, the MAE of the 26-40age group is smaller than that of the retest method( . reduction), From the experimental results, the accuracy andstability of this method to infer age are the best among themethods used in this article, but it requires that gender labelsbe available when training the network. The improvement ofgender information for age estimation is mainly manifested insamples after the developmental period(26-40 age group). Italso shows the inﬂuence of gender on age inference, especiallyfor adults.We take each year as an age group and select the LCimages whose estimated age is closest to the actual age,and the saliency maps are generated to determine the key Absolute Error ( m–s ) A g e R a n g e ( y e a r s ) W h o l e L C A B C A + B A + C B + C

Fig. 7 : When the input data is the entire LC image, the three parts A, B and C are involved or missing, the performancecomparison of age estimation

TABLE V : Comparison of performance in different age ranges and overall performance when different methods and input dataare applied. EFFI.: Basic network, EFFI.+CAM: Basic network with saliency graph constraints, EFFI.+CAM.+RTS.: Basicnetwork with saliency graph constraint applied restest mechanism, PAR.A, PAR.B and PAR.C: Only part A, part B and partC are used as the input data of the basic network, respectively, PAR.A+B, PAR.A+C and PAR.B+C: Part C, Part B and PartA are missing in the input of the basic network, respectively, EFFI.+GEN.: The basic network performs age estimation andgender classiﬁcation at the same time.

Method Age 4-10 11-15 16-20 21-25 26-30 31-35 36-40 4-25 26-40 AllEFFI. 0.88 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± areas of different age groups that are most closely related toage inference, as shown in Fig. 10 we have the followingﬁndings 1) Before the age of nine, the skull and maxillashowed a strong correlation, which was consistent with thedevelopment stage of the skull and maxilla. 2) At the ageof 10-13, the maxilla, mandible, cervical vertebra and skullshowed a strong correlation, suggesting that the craniofacialand cervical vertebra had a rapid development and changein puberty. 3) The saliency maps over 14 years old showsconsistent salient area, which is divided into three areas, the tooth area, the cervical spine area and the craniofacial areawithout the teeth. 4) With the increase of age, the change trendof the salient area is increasing, which indicates that the moretissues in LC image undergone aging changes with age. 5) Inthe saliency maps over the age of 14 years, the most salientarea related to age inference showed amazing consistency. Itis located in the upper part of the external auditory canal. Dueto the overlap of the left and right bone tissues of the LCimage, we can not determine which part of the head the mostsalient area is. This requires subsequent 3D image research. UTHOR et al. : PREPARATION OF PAPERS FOR IEEE TRANSACTIONS AND JOURNALS (FEBRUARY 2017) 11

Absolute Error ( m–s ) A g e R a n g e ( y e a r s ) A g e E s t i m a t i o n A g e E s t i m a t i o n + G e n d e r C l a s s i f i c a t i o n

Fig. 8 : Performance combination of age estimation and ageestimation with gender classiﬁcation

A r e a = 0 . 9 7

Sensitivity

S p e c i f i c i t y

Fig. 9 : ROC curve of basic network applied to gender classi-ﬁcation. The AUC of ROC curve is 0.97.The The average saliency maps of each year age group Fig.11 can also well reﬂect the above ﬁndings.By comparing MAEs and SDs of each parts for each agegroup (shown in Tab. V), we found that among the three singleparts in 4-25 age group, the accuracy of age estimation base onteeth part is the highest. In 26-40 age group, the cervical spinepart is the highest. The results indicate that before and afterthe age of 25, the teeth and cervical spine have the strongestcorrelation with age inference. In the pairwise combinationof three single parts, in 4-25 age group the accuracy of ageestimation base on part A+B(no the cervical spine) is thehighest, followed by part A+C, in 26-40 age group, This onceagain illustrates the importance of teeth part.The average saliency maps of each part were shown inFig. 12, Fig.13 and Fig.14. The salient area of each part isconsistent with that of the whole LC image. The location andshape of the salient area in the heat map of each year age groupshowed a high degree of consistency. The salient area of teeth part were mainly teeth and periodontal tissues, especially theupper posterior teeth, this should be related to the wear of theteeth and the ageing changes of the periodontal tissue.(Fig.12) The salient area of the craniofacial part without the teethwere mainly midface(Fig.13). Many scholars have conductedin-depth research on the aging changes of the orbit, and thevolume of the orbit increases with age. The development ofthe maxilla is also a research hotspot, but the research onthe aging of the maxilla in adults has not been involved. Thesaliency maps reminds us that it is necessary to conduct agingresearch on other organizational structures in the middle ofthe face. The salient area of the cervical spine part were allof cervical spines and intervertebral disc in the LC image.The morphological changes of the cervical spine are used todetermining the pubertal growth spurt of adolescence[27-29].The cervical spine was also used to infer age and gender [30-32] , but it is mainly used for children and adolescents, there isno relevant research using the cervical spine to infer the age ofadults. The cervical spine consists of 7 vertebral bodies andintervertebral discs. After the development is complete, thestructural changes of the cervical spine begin in middle age,but sometimes earlier[33]. I ntervertebral disc degenerationbegins at adolescence, and as it progresses, it can also leadsto morphological alterations of the vertebral bodies. Cervicallordosis increased with age[34]. These changes are difﬁcult touse by traditional methods of inferring age.In this study, we found that the accuracy of age estimationof cervical part is the highest among all parts in the 26-40 agegroup, which indicates that cervical region should be highlyconcerned, which is worthy of further study.Since only orthodontic patients take LC images in oralclinical work, there are very few orthodontic treatment patientsover 40 years old in China, and orthodontic patients under4 years old rarely need lateral radiographs, so the sampleage range of this study is 4-40 years old. That is to say,the methods used in this study cannot be used to infer ageover 40. Because of the way the LC images were taken,the left and right craniofacial bones and teeth overlapped to-gether, so the saliency regions in Grad-CAM cannot accuratelydescribe the anatomic structures for age estimation, whichneed to be further compared and determined by the studyof three-dimensional images in the future. Compared withthe traditional artiﬁcial age estimation method, deep learningmethod is data-driven and needs a large number of data sets,which is a common defect of deep learning method. However,meta learning and small sample learning methods make itpossible to use a small amount of data to achieve acceptableperformance. This is also our future research interest.

VI. CONCLUSION

1. In this paper, we proposed a novel method in whichLC images are used for age estimation for the ﬁrst time andhas achieved relatively ideal results. Aiming at the problemof fewer samples and larger errors after the age of 25, thesaliency map generated by Grad-CAM and the trained networkare used to restrict the attention of the network, therebyimproving the overall performance of the network, especiallythe performance of the samples after the age of 25. Fig. 10 : Sample of saliency maps for each age. As shown by the gradient color bar, the brighter the region, the more contributionto the age estimation

Fig. 11 : The average of all saliency maps for each age. The average of all saliency maps for each age. In order to show therelative position of the saliency map and the LC image, a 28-year-old LC image and its corresponding average of saliency mapare placed together. As shown by the gradient color bar, the brighter the region, the more contribution to the age estimation2. The saliency map was applied to visualize the contri- bution of each part of the image to the age estimation. The

UTHOR et al. : PREPARATION OF PAPERS FOR IEEE TRANSACTIONS AND JOURNALS (FEBRUARY 2017) 13 Fig. 12 : The average of the saliency maps of all part A in each age. In order to show the relative position of the saliency mapand the LC image, a Part A of 28-year-old image and its corresponding saliency map are placed together. Fig. 13 : The average of the saliency maps of all part B in each age. In order to show the relative position of the saliency mapand the LC image, a Part B of 28-year-old image and its corresponding saliency map are placed together.results found some new areas worthy of attention.3. Based on the saliency map, we also explored the perfor- mance of age estimation when different regions are involvedor missing. The results proved that the information in the Fig. 14 : The average of the saliency maps of all part C in each age. In order to show the relative position of the saliency mapand the LC image, a Part C of 28-year-old image and its corresponding saliency map are placed together.LC image is redundant. A single part can be used for ageestimation, and its performance is comparable to that of theentire image.4. The gain of gender to age estimation is also veriﬁed inour work. When outputting gender and age at the same time,the accuracy and stability of the method inferring age is thebest among the methods used in this article, especially theMAE of the age group over 25 years old is greatly reducedby 18.8%. A CKNOWLEDGMENT * R EFERENCES AND F OOTNOTES * A. References R EFERENCES [1] D. H. Haines, “Dental identiﬁcation in the rijeka air disaster,”

Forensicscience , vol. 1, no. 3, pp. 313–321, 1972.[2] R. Cameriere, S. De Luca, R. Biagi, M. Cingolani, G. Farronato, andL. Ferrante, “Accuracy of three age estimation methods in children bymeasurements of developing teeth and carpals and epiphyses of the ulnaand radius,”

Journal of forensic sciences , vol. 57, no. 5, pp. 1263–1270,2012.[3] V. Vieth, R. Schulz, W. Heindel, H. Pfeiffer, B. Buerke, A. Schmeling,and C. Ottow, “Forensic age assessment by 3.0 t mri of the knee:proposal of a new mri classiﬁcation of ossiﬁcation stages,”

Europeanradiology , vol. 28, no. 8, pp. 3255–3262, 2018.[4] L. Hackman, C. M. Davies, and S. Black, “Age estimation using footradiographs from a modern scottish population,”

Journal of forensicsciences , vol. 58, pp. S146–S150, 2013. [5] N. R. Langley, “The lateral clavicular epiphysis: fusion timing and ageestimation,”

International journal of legal medicine , vol. 130, no. 2, pp.511–517, 2016.[6] F. Savall, F. H´erin, P. A. Peyron, D. Roug´e, E. Baccino, P. Saint-Martin,and N. Telmon, “Age estimation at death using pubic bone analysis of avirtual reference sample,”

International journal of legal medicine , vol.132, no. 2, pp. 609–615, 2018.[7] K. M. Hartnett, “Analysis of age-at-death estimation using data from anew, modern autopsy sample—part i: pubic bone,”

Journal of forensicsciences , vol. 55, no. 5, pp. 1145–1151, 2010.[8] K. Krishan, T. Kanchan, and A. K. Garg, “Dental evidence in forensicidentiﬁcation–an overview, methodology and present status,”

The opendentistry journal , vol. 9, p. 250, 2015.[9] D. Sweet, “Forensic dental identiﬁcation,”

Forensic science interna-tional , vol. 201, no. 1-3, pp. 3–4, 2010.[10] J. Savita, B. Y. Kumar, and N. Mamatha, “Teeth as age estimation toolin children and adolescents,”

Journal of Medicine, Radiology, Pathologyand Surgery , vol. 4, no. 4, pp. 12–15, 2017.[11] C. Stavrianos, D. Mastagas, I. Stavrianou, and O. Karaiskou, “Dentalage estimation of adults: A review of methods and principals,”

Res JMed Sci , vol. 2, no. 5, pp. 258–68, 2008.[12] D. Anderson, G. Thompson, and F. Popovich, “Age of attainment ofmineralization stages of the permanent dentition,”

Journal of ForensicScience , vol. 21, no. 1, pp. 191–200, 1976.[13] A. Demirjian, H. Goldstein, and J. M. Tanner, “A new system of dentalage assessment,”

Human biology , pp. 211–227, 1973.[14] C. F. Moorrees, E. A. Fanning, and E. E. Hunt Jr, “Age variation offormation stages for ten permanent teeth,”

Journal of dental research ,vol. 42, no. 6, pp. 1490–1502, 1963.[15] J. Seth, A. Agarwal, H. Aeran, Y. Krishnan et al. , “Dental age estimationin children and adolescents,”

Indian Journal of Dental Sciences , vol. 10,no. 4, p. 248, 2018.[16] G. Gustafson, “Age determination on teeth. j am dent assoc,”

Journalof the American Dental Association , vol. 41, no. 1, pp. 45–54, 1950.[17] W. Maples, “An improved technique using dental histology for esti-mation of adult age,”

Journal of Forensic Science , vol. 23, no. 4, pp.764–770, 1978.[18] T. Solheim, “A new method for dental age estimation in adults,”

Forensicscience international , vol. 59, no. 2, pp. 137–147, 1993.

UTHOR et al. : PREPARATION OF PAPERS FOR IEEE TRANSACTIONS AND JOURNALS (FEBRUARY 2017) 15 [19] J. Ball, “A critique of age estimation using attrition as the sole indicator.”

The Journal of forensic odonto-stomatology , vol. 20, no. 2, p. 38, 2002.[20] E. A. Huttner, D. C. Machado, R. B. De Oliveira, A. G. F. Antunes, andE. Hebling, “Effects of human aging on periodontal tissues,”

SpecialCare in Dentistry , vol. 29, no. 4, pp. 149–155, 2009.[21] C. Li and G. Ji, “Age estimation from the permanent molar in northeastchina by the method of average stage of attrition,”

Forensic scienceinternational , vol. 75, no. 2-3, pp. 189–196, 1995.[22] T. Solheim, “Recession of periodontal ligament as an indicator of age.”

The Journal of forensic odonto-stomatology , vol. 10, no. 2, p. 32, 1992.[23] V. Vlaskalic, R. L. Boyd, and S. Baumrind, “Etiology and sequelae ofroot resorption,” in

Seminars in orthodontics , vol. 4, no. 2. Elsevier,1998, pp. 124–131.[24] J.-I. Yun, J.-Y. Lee, J.-W. Chung, H.-S. Kho, and Y.-K. Kim, “Ageestimation of korean adults by occlusal tooth wear,”

Journal of forensicsciences , vol. 52, no. 3, pp. 678–683, 2007.[25] S. Gupta, A. Chandra, A. Agnihotri, O. P. Gupta, and N. Maurya, “Ageestimation by dentin translucency measurement using digital method: Aninstitutional study,”

Journal of Forensic Dental Sciences , vol. 9, no. 1,p. 42, 2017.[26] N. Mohan and M. T. Sabitha Gokulraj, “Age estimation by cementalannulation rings,”

Journal of Forensic Dental Sciences , vol. 10, no. 2,p. 79, 2018.[27] A. Singhal, V. Ramesh, and P. Balamurali, “A comparative analysis ofroot dentin transparency with known age,”

Journal of forensic dentalsciences , vol. 2, no. 1, p. 18, 2010.[28] T. SOLHEIM, “Dental root translucency as an indicator of age,”

Euro-pean Journal of Oral Sciences , vol. 97, no. 3, pp. 189–197, 1989.[29] U. Wittwer-Backofen, “Age estimation using tooth cementum annula-tion,” in

Forensic Microscopy for Skeletal Tissues . Springer, 2012, pp.129–143.[30] K. Lackovic and R. Wood, “Tooth root colour as a measure of chrono-logical age.”

The Journal of forensic odonto-stomatology , vol. 18, no. 2,pp. 37–45, 2000.[31] S. Martin-de las Heras, A. Valenzuela, R. Bellini, C. Salas, M. Rubino,and J. A. Garcia, “Objective measurement of dental color for ageestimation by spectroradiometry,”

Forensic science international , vol.132, no. 1, pp. 57–62, 2003.[32] K. Alkass, B. A. Buchholz, S. Ohtani, T. Yamamoto, H. Druid, and K. L.Spalding, “Age estimation in forensic sciences: application of combinedaspartic acid racemization and radiocarbon analysis,”

Molecular &Cellular Proteomics , vol. 9, no. 5, pp. 1022–1030, 2010.[33] B. Bekaert, A. Kamalandua, S. C. Zapico, W. Van de Voorde, andR. Decorte, “Improved age determination of blood and teeth samplesusing a selected set of dna methylation markers,”

Epigenetics , vol. 10,no. 10, pp. 922–930, 2015.[34] K. L. Spalding, B. A. Buchholz, L.-E. Bergman, H. Druid, and J. Fris´en,“Age written in teeth by nuclear tests,”

Nature , vol. 437, no. 7057, pp.333–334, 2005.[35] R. Cameriere, L. Ferrante, and M. Cingolani, “Age estimation in childrenby measurement of open apices in teeth,”

International Journal of LegalMedicine , vol. 120, no. 1, pp. 49–52, 2006.[36] A. d. C. S. Azevedo, N. Z. Alves, E. Michel-Crosato, M. Rocha,R. Cameriere, and M. G. H. Biazevic, “Dental age estimation in abrazilian adult population using cameriere’s method,”

Brazilian oralresearch , vol. 29, no. 1, pp. 1–9, 2015.[37] S. Mittal, S. G. Nagendrareddy, M. L. Sharma, P. Agnihotri, S. Chaud-hary, and M. Dhillon, “Age estimation based on kvaal’s technique usingdigital panoramic radiographs,”

Journal of forensic dental sciences ,vol. 8, no. 2, p. 115, 2016.[38] K. R. Patil and R. N. Mody, “Determination of sex by discriminant func-tion analysis and stature by regression analysis: a lateral cephalometricstudy,”

Forensic science international , vol. 147, no. 2-3, pp. 175–180,2005.[39] N. Gandhi, S. Jain, H. Kahlon, A. Singh, R. S. Gambhir, and A. Gaur,“Signiﬁcance of mandibular canine index in sexual dimorphism and aidin personal identiﬁcation in forensic odontology,”

Journal of forensicdental sciences , vol. 9, no. 2, p. 56, 2017.[40] B. N. Rajarathnam, M. P. David, and A. P. Indira, “Mandibular caninedimensions as an aid in gender estimation,”

Journal of Forensic DentalSciences , vol. 8, no. 2, p. 83, 2016.[41] Q. Dou, C. Ouyang, C. Chen, H. Chen, and P.-A. Heng, “Unsupervisedcross-modality domain adaptation of convnets for biomedical imagesegmentations with adversarial loss,” arXiv preprint arXiv:1804.10916 ,2018. [42] A. Y. Hannun, P. Rajpurkar, M. Haghpanahi, G. H. Tison, C. Bourn,M. P. Turakhia, and A. Y. Ng, “Cardiologist-level arrhythmia detectionand classiﬁcation in ambulatory electrocardiograms using a deep neuralnetwork,”

Nature medicine , vol. 25, no. 1, p. 65, 2019.[43] H. Pratt, F. Coenen, and Y. Zheng, “Feature visualisation of classiﬁcationof diabetic retinopathy using a convolutional neural network.” in

KHD@IJCAI , 2019, pp. 23–29.[44] W. De Back, S. Seurig, S. Wagner, B. Marr´e, I. Roeder, and N. Scherf,“Forensic age estimation with bayesian convolutional neural networksbased on panoramic dental x-ray imaging,” 2019.[45] S. Alkaabi, S. Yussof, and S. Al-Mulla, “Evaluation of convolutionalneural network based on dental images for age estimation,” in . IEEE, 2019, pp. 1–5.[46] E. H. Houssein, N. Mualla, and M. Hassan, “Dental age estimation basedon x-ray images,”

Computers, Materials & Continua , vol. 62, no. 2, pp.591–605, 2020.[47] I. Ili´c, M. Vodanovi´c, and M. Subaˇsi´c, “Gender estimation frompanoramic dental x-ray images using deep convolutional networks,”in

IEEE EUROCON 2019-18th International Conference on SmartTechnologies . IEEE, 2019, pp. 1–5.[48] J. Kim, W. Bae, K.-H. Jung, and I.-S. Song, “Development and validationof deep learning-based algorithms for the estimation of chronologicalage using panoramic dental x-ray images,” 2019.[49] N. Vila-Blanco, M. J. Carreira, P. Varas-Quintana, C. Balsa-Castro, andI. Tomas, “Deep neural networks for chronological age estimation fromopg images,”

IEEE transactions on medical imaging , vol. 39, no. 7, pp.2374–2384, 2020.[50] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classiﬁcationwith deep convolutional neural networks,”

Communications of the ACM ,vol. 60, no. 6, pp. 84–90, 2017.[51] D. Jha, M. A. Riegler, D. Johansen, P. Halvorsen, and H. D. Johansen,“Doubleu-net: A deep convolutional neural network for medical imagesegmentation,” arXiv preprint arXiv:2006.04868 , 2020.[52] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau,and S. Thrun, “Dermatologist-level classiﬁcation of skin cancer withdeep neural networks,” nature , vol. 542, no. 7639, pp. 115–118, 2017.[53] A. A. A. Setio, F. Ciompi, G. Litjens, P. Gerke, C. Jacobs, S. J. Van Riel,M. M. W. Wille, M. Naqibullah, C. I. S´anchez, and B. van Ginneken,“Pulmonary nodule detection in ct images: false positive reductionusing multi-view convolutional networks,”

IEEE transactions on medicalimaging , vol. 35, no. 5, pp. 1160–1169, 2016.[54] C. Han, Y. Kitamura, A. Kudo, A. Ichinose, L. Rundo, Y. Furukawa,K. Umemoto, Y. Li, and H. Nakayama, “Synthesizing diverse lungnodules wherever massively: 3d multi-conditional gan-based ct imageaugmentation for object detection,” in . IEEE, 2019, pp. 729–737.[55] H. Lee, S. Tajmir, J. Lee, M. Zissen, B. A. Yeshiwas, T. K. Alkasab,G. Choy, and S. Do, “Fully automated deep learning system for boneage assessment,”

Journal of digital imaging , vol. 30, no. 4, pp. 427–441,2017.[56] N. Zhao, N. Tong, D. Ruan, and K. Sheng, “Fully automated pancreassegmentation with two-stage 3d convolutional neural networks,” in

International Conference on Medical Image Computing and Computer-Assisted Intervention . Springer, 2019, pp. 201–209.[57] F. Shi, J. Wang, J. Shi, Z. Wu, Q. Wang, Z. Tang, K. He, Y. Shi, andD. Shen, “Review of artiﬁcial intelligence techniques in imaging dataacquisition, segmentation and diagnosis for covid-19,”

IEEE reviews inbiomedical engineering , 2020.[58] T. D. Bui, J.-J. Lee, and J. Shin, “Incorporated region detection and clas-siﬁcation using deep convolutional networks for bone age assessment,”

Artiﬁcial intelligence in medicine , vol. 97, pp. 1–8, 2019.[59] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi,M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, and C. I. S´anchez,“A survey on deep learning in medical image analysis,”

Medical imageanalysis , vol. 42, pp. 60–88, 2017.[60] R. Hecht-Nielsen, “Theory of the backpropagation neural network,” in

Neural networks for perception . Elsevier, 1992, pp. 65–93.[61] M. Tan and Q. V. Le, “Efﬁcientnet: Rethinking model scaling forconvolutional neural networks,” arXiv preprint arXiv:1905.11946 , 2019.[62] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen,“Mobilenetv2: Inverted residuals and linear bottlenecks,” in

Proceedingsof the IEEE conference on computer vision and pattern recognition ,2018, pp. 4510–4520.[63] M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard,and Q. V. Le, “Mnasnet: Platform-aware neural architecture search for mobile,” in

Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition , 2019, pp. 2820–2828.[64] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in

Proceedings of the IEEE conference on computer vision and patternrecognition , 2018, pp. 7132–7141.[65] A. Vellido, “The importance of interpretability and visualization inmachine learning for applications in medicine and health care,”

NeuralComputing and Applications , pp. 1–15, 2019.[66] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, andD. Batra, “Grad-cam: Visual explanations from deep networks viagradient-based localization,” in

Proceedings of the IEEE internationalconference on computer vision , 2017, pp. 618–626.[67] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learningdeep features for discriminative localization,” in

Proceedings of the IEEEConference on Computer Vision and Pattern Recognition (CVPR) , June2016.[68] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 , 2014.[69] M. Lin, Q. Chen, and S. Yan, “Network in network,” arXiv preprintarXiv:1312.4400arXiv preprintarXiv:1312.4400