[PDF] Analysis of Hand-Crafted and Automatic-Learned Features for Glaucoma Detection Through Raw Circmpapillary OCT Images

Abstract

Taking into account that glaucoma is the leading cause of blindness worldwide, we propose in this paper three different learning methodologies for glaucoma detection in order to elucidate that traditional machine-learning techniques could outperform deep-learning algorithms, especially when the image data set is small. The experiments were performed on a private database composed of 194 glaucomatous and 198 normal B-scans diagnosed by expert ophthalmologists. As a novelty, we only considered raw circumpapillary OCT images to build the predictive models, without using other expensive tests such as visual field and intraocular pressure measures. The results ratify that the proposed hand-driven learning model, based on novel descriptors, outperforms the automatic learning. Additionally, the hybrid approach consisting of a combination of both strategies reports the best performance, with an area under the ROC curve of 0.85 and an accuracy of 0.82 during the prediction stage.

Full PDF

AAnalysis of Hand-Crafted andAutomatic-Learned Features for GlaucomaDetection Through Raw Circumpapillary OCTImages

Gabriel Garc´ıa , Adri´an Colomer , Valery Naranjo Instituto de Investigacin e Innovacin en Bioingeniera (I3B), Universitat Politcnica deValncia, Camino de Vera s/n, 46022, Valencia, Spain. [email protected]

Abstract.

Taking into account that glaucoma is the leading cause ofblindness worldwide, we propose in this paper three diﬀerent learningmethodologies for glaucoma detection in order to elucidate that tradi-tional machine-learning techniques could outperform deep-learning al-gorithms, especially when the image data set is small. The experimentswere performed on a private database composed of 194 glaucomatous and198 normal B-scans diagnosed by expert ophthalmologists. As a novelty,we only considered raw circumpapillary OCT images to build the pre-dictive models, without using other expensive tests such as visual ﬁeldand intraocular pressure measures. The results ratify that the proposedhand-driven learning model, based on novel descriptors, outperforms theautomatic learning. Additionally, the hybrid approach consisting of acombination of both strategies reports the best performance, with anarea under the ROC curve of 0.85 and an accuracy of 0.82 during theprediction stage.

Keywords:

Glaucoma, circumpapillary OCT, hand-driven learning, deeplearning, hybrid classiﬁcation.

Glaucoma is a chronic optic neuropathy characterised by causing several visualﬁeld defects and structural changes in the optic nerve, such as a thinning ofthe retinal nerve ﬁbre layer (RNFL) [1]. Nowadays, this degenerative disease isthe leading cause of blindness worldwide and is expected to aﬀect 111.8 millionpeople in 2040 [2]. The glaucoma diagnosis includes diﬀerent expensive analysis(pachymetry, tonometry and visual ﬁeld tests, among others) besides a subjectiveinterpretation of expert ophthalmologists who often diﬀer, especially in terms ofearly identiﬁcation [3]. Currently, imaging techniques based on fundus image andoptical coherence tomography (OCT) have become a powerful tool to addressthe glaucoma diagnosis. a r X i v : . [ ee ss . I V ] S e p Gabriel Garc´ıa et al.

Related work.

Timely treatment of glaucoma is essential to avoid the irre-versible vision loss [2], so several computer-aided diagnosis systems and pre-dictive algorithms focused on OCT and fundus images have been proposed inthe literature to achieve early detection. Most of them were performed throughtraditional machine-learning (ML) techniques based on feature extraction andselection methods [4–6]. All of them had in common the use of additional pa-rameters relevant for glaucoma diagnoses, such as the intraocular pressure (IOP)and visual ﬁeld (VF) tests, besides the OCT images. Unlike these works, we pro-pose an innovative end-to-end system able to predict the glaucoma disease justfrom raw circumpapillary OCT images, without taking into account externalexpensive tests, like VF or IOP. We aim to elucidate the added value that thisOCT samples around the retina optic nerve head (ONH) can provide for glau-coma detection. In recent years, the overwhelming irruption of the deep learning(DL) has replaced the traditional hand crafted-based methods, but most of thestate-of-the-art studies used fundus images [7, 8] or RNFL thickness probabilitymaps extracted by combining fundus images and OCT B-scans [9, 10]. However,to the best of the authors’ knowledge, we are the ﬁrst that apply deep learningto evidence glaucoma just from raw circumpapillary OCT images.

Contribution of this work.

In this paper, we propose a comparison betweentraditional and contemporary machine-learning models to analyse whether hand-driven approaches can outperform deep-learning algorithms for glaucoma detec-tion, especially when addressing small databases. We hypothesise that the CNNscannot replace the original way in which people can encode the information cap-tured from a subjective point of view; in the same way that CNNs are able toidentify hidden patterns that are not within reach of the human eye. For thisreason, as the main novelty, we propose a hybrid model by fusing the featuresextracted from both ML and DL approaches to identify glaucoma.

The present work was carried out making use of a private database, comingfrom the Oftalvist Ophthalmic Clinic, which consists of 392 B-scans around theONH of the retina. In particular, 194 samples from 97 patients were diagnosedby expert ophthalmologists as glaucomatous, whereas 198 circumpapillary im-ages from 99 patients were associated with normal eyes. Note that each B-scan,of dimensions M × N = 496 ×

768 pixels, was acquired using the

HeidelbergSpectrallis OCT system, which allows obtaining an axial resolution of 4-5 µ m. In order to provide reliable results, we carried out a patient-based partitioningof the database to separate the training set from the independent test set. In ybrid learning for glaucoma detection from OCT images 3 particular, of the data (158 healthy and 156 glaucomatous OCT images) wereused to train the predictive models, whereas (40 healthy and 38 glaucomatousB-scans) were employed to test them. Additionally, we applied an internal k = 5-fold cross-validation technique intending to manage the overﬁtting and select thebest parameters during the validation stage. Finally, we used the entire trainingset to build the ﬁnal predictive models, which were assessed on the external set. This approach consists of three main phases corresponding to the feature extrac-tion, feature selection and classiﬁcation through several traditional ML classi-ﬁers. Note that we used the RNFL and retina structures segmentation extractedby the Heidelberg Spectrallis system to perform this methodology.

Feature extraction . For each circumpapillary OCT image I i , being i = { , , , ..., P } and P = 392 the number of B-scans, we combine, as a novelty,diﬀerent variables related to four main descriptors: RNFL thickness, texturevariables, fractal analysis and demographic data (age and gender). Regardingthe RNFL thickness, we propose in this paper an innovative way of codifyingthe information, unlike [5, 6] where the authors employed the measures directlyextracted from the hardware system. Let the vector T i = { t , t , ..., t j , ..., t N } ,where each t j ∈ T is the RNFL thickness calculated from the image I i in theposition j , the proposed method is able to group the thickness values in a h i,p histogram vector, where p = { , , , } . For each p value, h i,p vector allowsquantifying the number of thickness t j whose distance is ranged between D p and D p +1 , being D = [0 , , , , maxT i ) and maximum ( minT i ) RNFL thickness values were alsocalculated for this descriptor category. Concerning the texture variables, grey-level co-occurrence matrix (GLCM) [11] and local binary patterns (LBP) wereapplied for encoding the textural information contained in the retina region ofeach I i . Variables such as contrast, correlation, energy, homogeneity, entropy,mean and standard deviation were calculated from a GLCM of dimensions 8 × x is adjacent to another with intensity y . Additionally, inorder to recognise local texture information, we combined the uniformly invariantto rotation transforms LBP riu P,R operator, proposed in [12], with the rotationalinvariant local variance

VAR descriptor, to ﬁnally compute the LBP variance(LBPV) histogram, proposed in [13] (see Fig. 1). It should be noted that similartexture descriptors have been previously considered for glaucoma detection fromfundus images [14,15] but, to the best of the authors’ knowledge, this is the ﬁrsttime that GLCM and LBP features are applied to circumpapillary OCT images.Additionally, we analysed the fractal dimension in ﬁve directions (0, 30, 45,60, 90) via the Hurst Exponent [16] computation to determine the presence ofunderlying trends in the complexity of the retinal region of each I i . After thefeature extraction phase, 75 hand-crafted variables per learning instance weretaken into account to address the next stage. Gabriel Garc´ıa et al.

Fig. 1. (a-b) Examples of glaucomatous and normal samples. (c-d) LBP images. (e-f)VAR images. (g) 10-bin histograms of the LBPV operator from LBP and VAR images.

Feature selection . An in-depth statistical analysis was carried out to selectthe most relevant variables in order to feed the proposed classiﬁers. Initially,a

Kolmogorov-Smirnov test was applied to determine the distribution of thevariables. Then,

Student’s t-tests or Mann-Whitney U tests were performed toanalyse the discriminatory ability of each variable v by comparing means ormedians, respectively, depending on whether v followed a normal distributionN(0,1) or not. The correlation coeﬃcient was also calculated to obtain the in-dependence grade between pairs of variables. Note that a level of signiﬁcance α = 0 .

05 was deﬁned for both hypothesis contrast to discard the non-relevantfeatures, as well as the redundant information when p-value < α . Model Training . Once the most relevant features were selected, we traineddiﬀerent ML classiﬁers such as support vector machine (SVM) and multi-layerperceptron (MLP), in line with the state-of-the-art studies [4] and [6]. Several box constraints and kernel scale parameters for the SVM classiﬁers, and learningrates, loss functions, optimizers and network structures for the MLP networkwere considered during the internal cross-validation (ICV) stage. A considerableoutperforming of the MLP classiﬁer was achieved, with respect to the SVM, usingthe gradient descent adaptative optimizer with a learning rate of 0.001 and thebinary cross-entropy as a loss function. Concerning the network structure, onehidden layer with 8 neurons reported the best model performance. Note that theproposed hand-driven learning approach is represented by the blue lines in theﬂowchart exposed in Fig. 3. ybrid learning for glaucoma detection from OCT images 5

Similarly to the previous phase, an empirical exploration of several hyperparam-eters was performed in order to build the best predictive deep-learning modelduring the ICV stage. Convolutional, pooling, dropout, batch normalisation anddense layers were also applied to speciﬁc experimental combinations in searchof the best network architecture. Also, we considered the use of data augmenta-tion techniques to alleviate the problem of insuﬃcient data, by creating artiﬁcialsamples via geometric and intensity modiﬁcations from the original images. Thebest performance was achieved by training the CNN exposed in Fig. 2 and usingAdadelta optimizer with a learning rate of 0.005, squared hinge as a loss functionand a batch size of 32 during the ICV stage. It is noticeable that down-sampling × . Convolutionallayer Max pooling layer Global max pooling layer Softmaxlayer

228 x 384 x 1 228 x 384 x 32124 x 192 x 32 124 x 192 x 6462 x 96 x 64 62 x 96 x 128 128

Fig. 2.

Illustrative representation of the implemented CNN architecture.

As the main novelty in this paper, we propose a hybrid model to address theglaucoma detection taking into account both the hand-crafted and automatic-learning features. Our aim is to combine the original human point of view withthe hidden potential enclosed in the CNNs. Speciﬁcally, we made use of thepreviously deﬁned deep-learning base model as a feature extractor from eachOCT image. Then, we fused the 75 ML and 128 DL extracted features to formthe ﬁnal feature vector from which we performed, in the same conditions, thefeature selection and MLP training stages carried out in Section 3.2. Finally, thethree proposed models were assessed and compared using the test set, accordingto the ﬂowchart exposed below. Note that the information relative to the hybridapproach can be interpreted by the yellow lines in Fig. 3.

Regarding the hand-driven learning approach, 25 from a total of 75 featuresthat composed a learning instance were selected after the statistical analysis.

Gabriel Garc´ıa et al.

ML ModelResults

Trained DL Model

DL ModelTraining Load base modelPredict DL features

DL ModelResults Combined ModelResults

TRAIN SET TESTSETDataPartitioning

DL ModelassessmentCombined features

DL featuresML features

ML FeatureExtractionFeature SelectionSelected featuresML ModelTraining Trained ML model ML Model assessment

ML Features

ML FeatureExtraction

Predict DL featuresDL features

Fig. 3.

Flowchart detailing the proposed ML, DL and hybrid approaches, in blue, greenand yellow, respectively.

Otherwise, concerning the hybrid approach, 100 features from a total of 203were reported as relevant variables to address the MLP training stage. Notethat both ML and hybrid-ﬁnal feature vectors included variables correspondingto the four kinds of descriptors used in this work. In addition, all the proposedfeatures corresponding to the new RNFL thickness histogram-based method re-sulted statistically signiﬁcant. A boxplot relative to these features is exposed inFig. 4 to show the discriminatory ability of the proposed new descriptor. Besides,we also represent the correlation matrix of the same variables to evidence theindependence level between them. . Classiﬁcation results reached during the validation stageare detailed in Table 1 to objectively compare the proposed hand-driven learn-ing (HDL), deep-learning (DL) and hybrid-learning methodologies. Diﬀerent ﬁg-ures of merit, such as sensitivity (SN), speciﬁcity (SPC), F-score (FS), accuracy(ACC) and area under the ROC curve (AUC) are taken into account to as-sess the models providing reliable results. The ﬁndings are directly in line withthe hypothesis postulated in Section 1 since hand-driven learning approach hasdemonstrated surpassing deep-learning methods for small training sets, and thehybrid strategy clearly outperforms the rest, according to Table 1. ybrid learning for glaucoma detection from OCT images 7 -1 Boxplot of features 1 - 6

GLAUCOMA SAMPLESNORMAL SAMPLES m i n 𝑇 ! m a x 𝑇 ! 𝑡 ! , 𝑡 ! , $ 𝑡 ! , % 𝑡 ! , & GlaucomaHealthy

Correlation of Discriminatory features min 𝑇 ! max 𝑇 ! 𝑡 !, 𝑡 !,$ 𝑡 !,% 𝑡 !,& m i n 𝑇 ! m a x 𝑇 ! 𝑡 ! , 𝑡 ! , $ 𝑡 ! , % 𝑡 ! , & (a) (b) Fig. 4. (a) Boxplot and (b) correlation matrix corresponding to the innovative sixproposed RNFL thickness features.

Table 1.

Quantitative results reached during the ICV stage from all approaches.

HDL approach DL approach Hybrid approachSN 0.802 ± ± ± SPC ± ± ± ± ± ± ± ± ± ± ± ± Test results . In this section, we detail an external validation of the threeproposed models using the independent test set, as it was previously explained inFig. 3. The classiﬁcation results corresponding to the test set are exposed in Table2. We can observe that, in line with the ICV stage, hand-driven learning providesa slight improvement regarding the deep-learning approach, which reaches valuesaround 0.7 for all measures. Additionally, the hybrid methodology, characterisedby the fusion of the features extracted from both ML and DL models, reportsthe most promising results for almost all ﬁgures of merit. It is important to notethat an objective comparison with other state-of-the-art studies is not possiblebecause all of them were performed on private databases or using another kindof input data, such as RNFL thickness probability maps or visual ﬁeld tests.

Table 2.

Results comparison between the proposed models during the prediction stage.

HDL model DL model Hybrid modelSN 0.7632

SPC

Gabriel Garc´ıa et al.

In this work, three diﬀerent learning methodologies have been proposed withthe aim of elucidating that, under speciﬁc circumstances, hand-driven learn-ing approaches can outperform deep-learning algorithms. The reported resultsevidenced that a combination of hand-crafted and data-learning strategies canimprove the models’ performance, especially for small databases.

References

1. Weinreb, R.N., Khaw, P.T.: Primary open-angle glaucoma. The Lancet (9422)(2004) 1711–17202. Jonas, J.B., Aung, T., Bourne, R.R., Bron, A.M., Ritch, R., Panda-Jonas, S.:Glaucoma–authors’ reply. The Lancet (10122) (2018) 7403. National, G.A.U.: Glaucoma: diagnosis and management. (2017)4. Bizios, D., Heijl, A., Hougaard, J.L., Bengtsson, B.: Machine learning classiﬁersfor glaucoma diagnosis based on classiﬁcation of retinal nerve ﬁbre layer thicknessparameters measured by stratus oct. Acta ophthalmologica (1) (2010) 44–525. Asaoka, R., Hirasawa, K., Iwase, A.e.a.: Validating the usefulness of the randomforests classiﬁer to diagnose early glaucoma with optical coherence tomography.American journal of ophthalmology (2017) 95–1036. Kim, S.J., Cho, K.J., Oh, S.: Development of machine learning models for diagnosisof glaucoma. PLoS One (5) (2017) e01777267. Diaz-Pinto, A., Colomer, A., Naranjo, V., Morales, S., Xu, Y., Frangi, A.F.: Reti-nal image synthesis and semi-supervised learning for glaucoma assessment. IEEEtransactions on medical imaging (2019)8. Medeiros, F.A., Jammal, A.A., Thompson, A.C.: From machine to machine: Anoct-trained deep learning algorithm for objective quantiﬁcation of glaucomatousdamage in fundus photographs. Ophthalmology (4) (2019) 513–5219. Muhammad, H., Fuchs, T.J., De Cuir, N., De Moraes, C.G.e.a.: Hybrid deep learn-ing on single wide-ﬁeld optical coherence tomography scans accurately classiﬁesglaucoma suspects. Journal of glaucoma (12) (2017) 108610. Wang, P., Shen, J., Chang, R., Moloney, M., Torres, M., Burkemper, B.e.a.: Ma-chine learning models for diagnosing glaucoma from retinal nerve ﬁber layer thick-ness maps. Ophthalmology Glaucoma (6) (2019) 422–42811. Haralick, R.M., Shanmugam, K., Dinstein, I.H.: Textural features for image classi-ﬁcation. IEEE Transactions on systems, man, and cybernetics (6) (1973) 610–62112. Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotationinvariant texture classiﬁcation with local binary patterns. IEEE Transactions onpattern analysis and machine intelligence (7) (2002) 971–98713. Guo, Z., Zhang, L., Zhang, D.: A completed modeling of local binary patternoperator for texture classiﬁcation. IEEE Transactions on Image Processing19