Analysis of Hand-Crafted and Automatic-Learned Features for Glaucoma Detection Through Raw Circmpapillary OCT Images
AAnalysis of Hand-Crafted andAutomatic-Learned Features for GlaucomaDetection Through Raw Circumpapillary OCTImages
Gabriel Garc´ıa , Adri´an Colomer , Valery Naranjo Instituto de Investigacin e Innovacin en Bioingeniera (I3B), Universitat Politcnica deValncia, Camino de Vera s/n, 46022, Valencia, Spain. [email protected]
Abstract.
Taking into account that glaucoma is the leading cause ofblindness worldwide, we propose in this paper three different learningmethodologies for glaucoma detection in order to elucidate that tradi-tional machine-learning techniques could outperform deep-learning al-gorithms, especially when the image data set is small. The experimentswere performed on a private database composed of 194 glaucomatous and198 normal B-scans diagnosed by expert ophthalmologists. As a novelty,we only considered raw circumpapillary OCT images to build the pre-dictive models, without using other expensive tests such as visual fieldand intraocular pressure measures. The results ratify that the proposedhand-driven learning model, based on novel descriptors, outperforms theautomatic learning. Additionally, the hybrid approach consisting of acombination of both strategies reports the best performance, with anarea under the ROC curve of 0.85 and an accuracy of 0.82 during theprediction stage.
Keywords:
Glaucoma, circumpapillary OCT, hand-driven learning, deeplearning, hybrid classification.
Glaucoma is a chronic optic neuropathy characterised by causing several visualfield defects and structural changes in the optic nerve, such as a thinning ofthe retinal nerve fibre layer (RNFL) [1]. Nowadays, this degenerative disease isthe leading cause of blindness worldwide and is expected to affect 111.8 millionpeople in 2040 [2]. The glaucoma diagnosis includes different expensive analysis(pachymetry, tonometry and visual field tests, among others) besides a subjectiveinterpretation of expert ophthalmologists who often differ, especially in terms ofearly identification [3]. Currently, imaging techniques based on fundus image andoptical coherence tomography (OCT) have become a powerful tool to addressthe glaucoma diagnosis. a r X i v : . [ ee ss . I V ] S e p Gabriel Garc´ıa et al.
Related work.
Timely treatment of glaucoma is essential to avoid the irre-versible vision loss [2], so several computer-aided diagnosis systems and pre-dictive algorithms focused on OCT and fundus images have been proposed inthe literature to achieve early detection. Most of them were performed throughtraditional machine-learning (ML) techniques based on feature extraction andselection methods [4–6]. All of them had in common the use of additional pa-rameters relevant for glaucoma diagnoses, such as the intraocular pressure (IOP)and visual field (VF) tests, besides the OCT images. Unlike these works, we pro-pose an innovative end-to-end system able to predict the glaucoma disease justfrom raw circumpapillary OCT images, without taking into account externalexpensive tests, like VF or IOP. We aim to elucidate the added value that thisOCT samples around the retina optic nerve head (ONH) can provide for glau-coma detection. In recent years, the overwhelming irruption of the deep learning(DL) has replaced the traditional hand crafted-based methods, but most of thestate-of-the-art studies used fundus images [7, 8] or RNFL thickness probabilitymaps extracted by combining fundus images and OCT B-scans [9, 10]. However,to the best of the authors’ knowledge, we are the first that apply deep learningto evidence glaucoma just from raw circumpapillary OCT images.
Contribution of this work.
In this paper, we propose a comparison betweentraditional and contemporary machine-learning models to analyse whether hand-driven approaches can outperform deep-learning algorithms for glaucoma detec-tion, especially when addressing small databases. We hypothesise that the CNNscannot replace the original way in which people can encode the information cap-tured from a subjective point of view; in the same way that CNNs are able toidentify hidden patterns that are not within reach of the human eye. For thisreason, as the main novelty, we propose a hybrid model by fusing the featuresextracted from both ML and DL approaches to identify glaucoma.
The present work was carried out making use of a private database, comingfrom the Oftalvist Ophthalmic Clinic, which consists of 392 B-scans around theONH of the retina. In particular, 194 samples from 97 patients were diagnosedby expert ophthalmologists as glaucomatous, whereas 198 circumpapillary im-ages from 99 patients were associated with normal eyes. Note that each B-scan,of dimensions M × N = 496 ×
768 pixels, was acquired using the
HeidelbergSpectrallis OCT system, which allows obtaining an axial resolution of 4-5 µ m. In order to provide reliable results, we carried out a patient-based partitioningof the database to separate the training set from the independent test set. In ybrid learning for glaucoma detection from OCT images 3 particular, of the data (158 healthy and 156 glaucomatous OCT images) wereused to train the predictive models, whereas (40 healthy and 38 glaucomatousB-scans) were employed to test them. Additionally, we applied an internal k = 5-fold cross-validation technique intending to manage the overfitting and select thebest parameters during the validation stage. Finally, we used the entire trainingset to build the final predictive models, which were assessed on the external set. This approach consists of three main phases corresponding to the feature extrac-tion, feature selection and classification through several traditional ML classi-fiers. Note that we used the RNFL and retina structures segmentation extractedby the Heidelberg Spectrallis system to perform this methodology.
Feature extraction . For each circumpapillary OCT image I i , being i = { , , , ..., P } and P = 392 the number of B-scans, we combine, as a novelty,different variables related to four main descriptors: RNFL thickness, texturevariables, fractal analysis and demographic data (age and gender). Regardingthe RNFL thickness, we propose in this paper an innovative way of codifyingthe information, unlike [5, 6] where the authors employed the measures directlyextracted from the hardware system. Let the vector T i = { t , t , ..., t j , ..., t N } ,where each t j ∈ T is the RNFL thickness calculated from the image I i in theposition j , the proposed method is able to group the thickness values in a h i,p histogram vector, where p = { , , , } . For each p value, h i,p vector allowsquantifying the number of thickness t j whose distance is ranged between D p and D p +1 , being D = [0 , , , , maxT i ) and maximum ( minT i ) RNFL thickness values were alsocalculated for this descriptor category. Concerning the texture variables, grey-level co-occurrence matrix (GLCM) [11] and local binary patterns (LBP) wereapplied for encoding the textural information contained in the retina region ofeach I i . Variables such as contrast, correlation, energy, homogeneity, entropy,mean and standard deviation were calculated from a GLCM of dimensions 8 × x is adjacent to another with intensity y . Additionally, inorder to recognise local texture information, we combined the uniformly invariantto rotation transforms LBP riu P,R operator, proposed in [12], with the rotationalinvariant local variance
VAR descriptor, to finally compute the LBP variance(LBPV) histogram, proposed in [13] (see Fig. 1). It should be noted that similartexture descriptors have been previously considered for glaucoma detection fromfundus images [14,15] but, to the best of the authors’ knowledge, this is the firsttime that GLCM and LBP features are applied to circumpapillary OCT images.Additionally, we analysed the fractal dimension in five directions (0, 30, 45,60, 90) via the Hurst Exponent [16] computation to determine the presence ofunderlying trends in the complexity of the retinal region of each I i . After thefeature extraction phase, 75 hand-crafted variables per learning instance weretaken into account to address the next stage. Gabriel Garc´ıa et al.
Fig. 1. (a-b) Examples of glaucomatous and normal samples. (c-d) LBP images. (e-f)VAR images. (g) 10-bin histograms of the LBPV operator from LBP and VAR images.
Feature selection . An in-depth statistical analysis was carried out to selectthe most relevant variables in order to feed the proposed classifiers. Initially,a
Kolmogorov-Smirnov test was applied to determine the distribution of thevariables. Then,
Student’s t-tests or Mann-Whitney U tests were performed toanalyse the discriminatory ability of each variable v by comparing means ormedians, respectively, depending on whether v followed a normal distributionN(0,1) or not. The correlation coefficient was also calculated to obtain the in-dependence grade between pairs of variables. Note that a level of significance α = 0 .
05 was defined for both hypothesis contrast to discard the non-relevantfeatures, as well as the redundant information when p-value < α . Model Training . Once the most relevant features were selected, we traineddifferent ML classifiers such as support vector machine (SVM) and multi-layerperceptron (MLP), in line with the state-of-the-art studies [4] and [6]. Several box constraints and kernel scale parameters for the SVM classifiers, and learningrates, loss functions, optimizers and network structures for the MLP networkwere considered during the internal cross-validation (ICV) stage. A considerableoutperforming of the MLP classifier was achieved, with respect to the SVM, usingthe gradient descent adaptative optimizer with a learning rate of 0.001 and thebinary cross-entropy as a loss function. Concerning the network structure, onehidden layer with 8 neurons reported the best model performance. Note that theproposed hand-driven learning approach is represented by the blue lines in theflowchart exposed in Fig. 3. ybrid learning for glaucoma detection from OCT images 5
Similarly to the previous phase, an empirical exploration of several hyperparam-eters was performed in order to build the best predictive deep-learning modelduring the ICV stage. Convolutional, pooling, dropout, batch normalisation anddense layers were also applied to specific experimental combinations in searchof the best network architecture. Also, we considered the use of data augmenta-tion techniques to alleviate the problem of insufficient data, by creating artificialsamples via geometric and intensity modifications from the original images. Thebest performance was achieved by training the CNN exposed in Fig. 2 and usingAdadelta optimizer with a learning rate of 0.005, squared hinge as a loss functionand a batch size of 32 during the ICV stage. It is noticeable that down-sampling × . Convolutionallayer Max pooling layer Global max pooling layer Softmaxlayer
228 x 384 x 1 228 x 384 x 32124 x 192 x 32 124 x 192 x 6462 x 96 x 64 62 x 96 x 128 128
Fig. 2.
Illustrative representation of the implemented CNN architecture.
As the main novelty in this paper, we propose a hybrid model to address theglaucoma detection taking into account both the hand-crafted and automatic-learning features. Our aim is to combine the original human point of view withthe hidden potential enclosed in the CNNs. Specifically, we made use of thepreviously defined deep-learning base model as a feature extractor from eachOCT image. Then, we fused the 75 ML and 128 DL extracted features to formthe final feature vector from which we performed, in the same conditions, thefeature selection and MLP training stages carried out in Section 3.2. Finally, thethree proposed models were assessed and compared using the test set, accordingto the flowchart exposed below. Note that the information relative to the hybridapproach can be interpreted by the yellow lines in Fig. 3.
Regarding the hand-driven learning approach, 25 from a total of 75 featuresthat composed a learning instance were selected after the statistical analysis.
Gabriel Garc´ıa et al.
ML ModelResults
Trained DL Model
DL ModelTraining Load base modelPredict DL features
DL ModelResults Combined ModelResults
TRAIN SET TESTSETDataPartitioning
DL ModelassessmentCombined features
DL featuresML features
ML FeatureExtractionFeature SelectionSelected featuresML ModelTraining Trained ML model ML Model assessment
ML Features
ML FeatureExtraction
Predict DL featuresDL features
Fig. 3.
Flowchart detailing the proposed ML, DL and hybrid approaches, in blue, greenand yellow, respectively.
Otherwise, concerning the hybrid approach, 100 features from a total of 203were reported as relevant variables to address the MLP training stage. Notethat both ML and hybrid-final feature vectors included variables correspondingto the four kinds of descriptors used in this work. In addition, all the proposedfeatures corresponding to the new RNFL thickness histogram-based method re-sulted statistically significant. A boxplot relative to these features is exposed inFig. 4 to show the discriminatory ability of the proposed new descriptor. Besides,we also represent the correlation matrix of the same variables to evidence theindependence level between them. . Classification results reached during the validation stageare detailed in Table 1 to objectively compare the proposed hand-driven learn-ing (HDL), deep-learning (DL) and hybrid-learning methodologies. Different fig-ures of merit, such as sensitivity (SN), specificity (SPC), F-score (FS), accuracy(ACC) and area under the ROC curve (AUC) are taken into account to as-sess the models providing reliable results. The findings are directly in line withthe hypothesis postulated in Section 1 since hand-driven learning approach hasdemonstrated surpassing deep-learning methods for small training sets, and thehybrid strategy clearly outperforms the rest, according to Table 1. ybrid learning for glaucoma detection from OCT images 7 -1 Boxplot of features 1 - 6
GLAUCOMA SAMPLESNORMAL SAMPLES m i n 𝑇 ! m a x 𝑇 ! 𝑡 ! , 𝑡 ! , $ 𝑡 ! , % 𝑡 ! , & GlaucomaHealthy
Correlation of Discriminatory features min 𝑇 ! max 𝑇 ! 𝑡 !, 𝑡 !,$ 𝑡 !,% 𝑡 !,& m i n 𝑇 ! m a x 𝑇 ! 𝑡 ! , 𝑡 ! , $ 𝑡 ! , % 𝑡 ! , & (a) (b) Fig. 4. (a) Boxplot and (b) correlation matrix corresponding to the innovative sixproposed RNFL thickness features.
Table 1.
Quantitative results reached during the ICV stage from all approaches.
HDL approach DL approach Hybrid approachSN 0.802 ± ± ± SPC ± ± ± ± ± ± ± ± ± ± ± ± Test results . In this section, we detail an external validation of the threeproposed models using the independent test set, as it was previously explained inFig. 3. The classification results corresponding to the test set are exposed in Table2. We can observe that, in line with the ICV stage, hand-driven learning providesa slight improvement regarding the deep-learning approach, which reaches valuesaround 0.7 for all measures. Additionally, the hybrid methodology, characterisedby the fusion of the features extracted from both ML and DL models, reportsthe most promising results for almost all figures of merit. It is important to notethat an objective comparison with other state-of-the-art studies is not possiblebecause all of them were performed on private databases or using another kindof input data, such as RNFL thickness probability maps or visual field tests.
Table 2.
Results comparison between the proposed models during the prediction stage.
HDL model DL model Hybrid modelSN 0.7632
SPC
Gabriel Garc´ıa et al.
In this work, three different learning methodologies have been proposed withthe aim of elucidating that, under specific circumstances, hand-driven learn-ing approaches can outperform deep-learning algorithms. The reported resultsevidenced that a combination of hand-crafted and data-learning strategies canimprove the models’ performance, especially for small databases.
References
1. Weinreb, R.N., Khaw, P.T.: Primary open-angle glaucoma. The Lancet (9422)(2004) 1711–17202. Jonas, J.B., Aung, T., Bourne, R.R., Bron, A.M., Ritch, R., Panda-Jonas, S.:Glaucoma–authors’ reply. The Lancet (10122) (2018) 7403. National, G.A.U.: Glaucoma: diagnosis and management. (2017)4. Bizios, D., Heijl, A., Hougaard, J.L., Bengtsson, B.: Machine learning classifiersfor glaucoma diagnosis based on classification of retinal nerve fibre layer thicknessparameters measured by stratus oct. Acta ophthalmologica (1) (2010) 44–525. Asaoka, R., Hirasawa, K., Iwase, A.e.a.: Validating the usefulness of the randomforests classifier to diagnose early glaucoma with optical coherence tomography.American journal of ophthalmology (2017) 95–1036. Kim, S.J., Cho, K.J., Oh, S.: Development of machine learning models for diagnosisof glaucoma. PLoS One (5) (2017) e01777267. Diaz-Pinto, A., Colomer, A., Naranjo, V., Morales, S., Xu, Y., Frangi, A.F.: Reti-nal image synthesis and semi-supervised learning for glaucoma assessment. IEEEtransactions on medical imaging (2019)8. Medeiros, F.A., Jammal, A.A., Thompson, A.C.: From machine to machine: Anoct-trained deep learning algorithm for objective quantification of glaucomatousdamage in fundus photographs. Ophthalmology (4) (2019) 513–5219. Muhammad, H., Fuchs, T.J., De Cuir, N., De Moraes, C.G.e.a.: Hybrid deep learn-ing on single wide-field optical coherence tomography scans accurately classifiesglaucoma suspects. Journal of glaucoma (12) (2017) 108610. Wang, P., Shen, J., Chang, R., Moloney, M., Torres, M., Burkemper, B.e.a.: Ma-chine learning models for diagnosing glaucoma from retinal nerve fiber layer thick-ness maps. Ophthalmology Glaucoma (6) (2019) 422–42811. Haralick, R.M., Shanmugam, K., Dinstein, I.H.: Textural features for image classi-fication. IEEE Transactions on systems, man, and cybernetics (6) (1973) 610–62112. Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotationinvariant texture classification with local binary patterns. IEEE Transactions onpattern analysis and machine intelligence (7) (2002) 971–98713. Guo, Z., Zhang, L., Zhang, D.: A completed modeling of local binary patternoperator for texture classification. IEEE Transactions on Image Processing19