[PDF] Multi-level SVM Based CAD Tool for Classifying Structural MRIs

Abstract

The revolutionary developments in the field of supervised machine learning have paved way to the development of CAD tools for assisting doctors in diagnosis. Recently, the former has been employed in the prediction of neurological disorders such as Alzheimer's disease. We propose a CAD (Computer Aided Diagnosis tool for differentiating neural lesions caused by CVA (Cerebrovascular Accident) from the lesions caused by other neural disorders by using Non-negative Matrix Factorisation (NMF) and Haralick features for feature extraction and SVM (Support Vector Machine) for pattern recognition. We also introduce a multi-level classification system that has better classification efficiency, sensitivity and specificity when compared to systems using NMF or Haralick features alone as features for classification. Cross-validation was performed using LOOCV (Leave-One-Out Cross Validation) method and our proposed system has a classification accuracy of over 86%.

Full PDF

MMULTI - LEVEL SVM BASED CAD TOOL FORCLASSIFYING STRUCTURAL MRI s Jerrin Thomas Panachakel and Jeena R.S.

Abstract

The revolutionary developments in the ﬁeld of supervised machine learn-ing have paved way to the development of CAD tools for assisting doctors in di-agnosis. Recently, the former has been employed in the prediction of neurologicaldisorders such as Alzheimer’s disease. We propose a CAD (Computer Aided Di-agnosis tool for differentiating neural lesions caused by CVA (Cerebrovascular Ac-cident) from the lesions caused by other neural disorders by using Non-negativeMatrix Factorisation (NMF) and Haralick features for feature extraction and SVM(Support Vector Machine) for pattern recognition. We also introduce a multi-levelclassiﬁcation system that has better classiﬁcation efﬁciency, sensitivity and speci-ﬁcity when compared to systems using NMF or Haralick features alone as featuresfor classiﬁcation. Cross-validation was performed using LOOCV (Leave-One-OutCross Validation) method and our proposed system has a classiﬁcation accuracy ofover 86%.

Cerebrovascular accident (CVA) or cerebrovascular insult (CVI), commonly re-ferred to as “stroke” is the leading cause of death, next to ischaemic heart disease

Jerrin Thomas PanachakelDept. of Electrical EngineeringIndian Institute of ScienceBangaloree-mail: [email protected] R.S.Dept. of Electronics and Communication EngineeringCollege of EngineeringTrivandrume-mail: jeena [email protected] 1 a r X i v : . [ c s . C V ] J un Jerrin Thomas Panachakel and Jeena R.S. and the leading cause of adult disability worldwide [1, 2]. Globally, 15 million peo-ple suffer from stroke every year and of this, a third dies and half of the remainingstruggle with permanent disabilities [3]. The statistics is no different for developingcountries like India [4, 5, 6]. In Trivandrum, the capital of the Indian state of Kerala,the incidence rate of stroke is 135.0 and 138.0 per 1,00,000 inhabitants in the urbancommunity and the rural community respectively per year[5]. Clearly, stroke hastransformed from a disease pertaining to developed nations to a global hazard.It was these facts which motivated us to develop a prediction system for CVA. Asa predecessor to the proposed CAD (Computer Aided Diagnosis) tool for prediction,we have developed a CAD tool for differentiating brain MRIs of subjects who havesuffered from stroke (these MRIs will be hereafter referred to as “stroke MRI”) fromthe MRI of subjects suffering from other neural disorders (hereafter referred to as“non-stroke MRI”), which can be diagnosed from structural MRIs. To the best ofknowledge of the authors, this is a maiden work in the ﬁeld of stroke prediction usingneuroimaging, though there are several similar works in the literature, pertaining toother neurological disorders, such as schizophrenia [7], Alzheimer’s disease [8] etc.In this work, we have used the Haralick features [9] and Non-negative MatrixFactorisation [10] for feature extraction and SVM for classiﬁcation. Also, we haveintroduced a computationally efﬁcient method for combining feature vectors whichare linear in two different kernel spaces. For this, we have made use of the distanceof the feature vector from the hyperplane as a measure of conﬁdence value of clas-siﬁcation, thus improving the classiﬁcation efﬁciency obtained by using either oneof the two feature vectors individually.The rest of the paper is organized as follows: Section 2 gives the details about thedatabase used in this work. Section 3 brieﬂy discusses the various features used forclassiﬁcation. Section 4 describes multi-level classiﬁcation approach. Performancemetrics used is discussed in Section 5. Finally, the results are given in Section 6.

The database used for this work is “The Whole Brain Atlas”, developed by KeithJohnson, MD, and Alex Becker, PhD., with the support of the Brigham and Women’sHospital Departments of Radiology and Neurology, Harvard Medical School, theCountway Library of Medicine, and the American Academy of Neurology. Thedatabase includes the MRI images of neurological disease such as:1. Neoplastic Disease (brain tumor) • Metastatic adenocarcinoma • Metastatic bronchogenic carcinoma • Meningioma • Sarcoma2. Degenerative Disease

ULTI - LEVEL SVM BASED CAD TOOL FOR CLASSIFYING STRUCTURAL MRI s 3 • Alzheimer’s disease • Huntington’s disease • Motor neuron disease • Cerebral calcinosis3. Inﬂammatory or Infectious Disease • Multiple sclerosis • AIDS dementia • Creutzfeld-Jakob disease • Cerebral Toxoplasmosis4. Cardiovascular Accident (CVA) • Acute stroke: Speech arrest • Acute stroke: “alexia without agraphia” • Subacute stroke: “transcortical aphasia” • Chronic subdural hematoma • Hypertensive encephalopathy, and • Cerebral hemorrhage. (a) Original MRI (b) Preprocessed MRI

Fig. 1

An example of preprocessing the MRI image

A total of 30 T2 weighted MRI images from the database were used in the work. Outof these, 16 were non-stroke MRIs and 14 were stroke MRIs. All images were man-ually orientation correccted, cropped and resampled to a resolution of 256 × .

1% pixel intensity values, as proposed in [8] and [11]. The images were furtherprocessed for reducing noise using a bilayer ﬁlter [12, 13]. Original and prepro-cessed images are shown in Fig. 1.

Jerrin Thomas Panachakel and Jeena R.S.

We have relied on two sets of features for classiﬁcation: (1) textural features and (2)features extracted using NMF.

The textural features introduced by Robert M. Haralick in the paper titled “Texturalfeatures for image classiﬁcation” published in the year 1973 [9] is one of the mostimportant feature extraction techniques for images when the texture of the imageshas a pivotal role.. These features are generic in nature, meaning they are not devel-oped for a speciﬁc imaging modality but can be used for a wide range of images,including biomedical images, where texture is an important property due to the in-trinsic spatial tonal variations. The features are computed from matrix referred to asthe gray-tone spatial dependence matrix, given by: G =  c ( , ) c ( , ) · · · c ( , N g ) c ( , ) c ( , ) · · · c ( , N g ) ... ... . . . ... c ( N g , ) c ( N g , ) · · · c ( N g , N g )  (1)For an image with N g gray levels, the matrix will be a square matrix of dimension N g . The element in location { i , j } is the count of number of pixels with value i inthe neighbourhood of a pixel with value j . This can be extended to a probabilitymatrix by dividing with the appropriate count. As shown in Fig. 2, neighbourhoodcan be deﬁned in 4 different ways viz. horizontal, vertical, left diagonal and rightdiagonal. Hence, 4 different gray-tone spatial dependence matrices are possible foreach image. (a) (b) (c) (d) Fig. 2

Four different directional adjacencies possible. (a) Horizontal. (b) Vertical . (c) Left diago-nal (d) Right diagonal

14 different statistics are computed from each of the four gray-tone spatial de-pendence matrix using the following relations:1. Angular Second Moment: ∑ i ∑ j c ( i , j ) (2) ULTI - LEVEL SVM BASED CAD TOOL FOR CLASSIFYING STRUCTURAL MRI s 5

2. Contrast: N g − ∑ n = n { N g ∑ i = N g ∑ j = c ( i , j ) } , | i − j | = n (3)3. Correlation: ∑ i ∑ j ( i j ) c ( i , j ) − µ x µ y σ x σ y (4)4. Sum of Squares: Variation: ∑ i ∑ j ( i − µ ) c ( i , j ) (5)5. Inverse Difference Moment: ∑ i ∑ j + ( i − j ) c ( i , j ) (6)6. Sum Average: N g ∑ i = ip x + y ( i ) (7)7. Sum Variance: ∑ i = N g ( i − f s ) p x + y ( i ) (8)8. Sum Entropy: − N g ∑ i = p x + y ( i ) log { p x + y ( i ) } (9)9. Entropy: − ∑ i ∑ j c ( i , j ) log ( c ( i , j )) (10)10. Difference Variance: N g − ∑ i = i p x − y ( i ) (11)11. Difference Entropy: − N g − ∑ i = p x − y ( i ) log { p x − y ( i ) } (12)12. Information Measure of Correlation 1: HXY − HXY { HX , HY } (13)13. Information Measure of Correlation 2: ( − exp [ − ( HXY − HXY )]) . (14) Jerrin Thomas Panachakel and Jeena R.S.

14. Maximal Correlation Coefﬁcient:second largest eigen value of Q . (15)where µ x , µ y , σ x and σ y are the mean and standard deviations of p x and p y , theprobability density functions. Q ( i , j ) is given by the following relation: Q ( i , j ) = ∑ k c ( i , j ) c ( j , k ) p x ( i ) p y ( k ) . (16)Since each gray-tone spatial dependence matrix corresponds to a speciﬁc direc-tions, the features obtained may vary with the change in the orientation of the image.To have some level of orientation independence, we took the mean and range of thefeatures, hence giving only a total of 28 features per image instead of 56. One of the major drawbacks with feature extraction techniques such as PCA, is thatnegative values of the basis vectors are quite difﬁcult for being interpreted in manypractical applications. One of remedies for this is to have a representation wherenon-negativity is imposed by some means. Given a non-negative matrix A , the taskis to decompose the matrix into two non-negative matrices V and H subjected to thecondition that the Forbenius norm between the given matrix A and the product ofthe two vectors V H is minimum. min V ≥ , H ≥ || A − V H || F (17)This decomposition is known as Non-Negative Matrix Factorization (NMF). Thematrix V is called the basis matrix or the mixing matrix and H represents unknownor hidden sources. The beauty of (17) lies in the perspective of viewing each col-umn of A as a linear combination of columns of V with the weights given by thecomponents of each column in H [14, 15, 16]. The number of columns in V is verymuch less than the number of rows in A . Though NMF seems to be a computation-ally complex operation, several algorithms have been developed in the last decadefor the computationally efﬁcient implementation of NMF.The maiden work in the ﬁeld of NMF (Non-negative Matrix Factorisation) canbe traced back to a 1994 paper by Paatero and Tapper [10] in which they performedfactor analysis on environmental data [17]. Their aim was to ﬁnd the common latentfeatures or latent variables that explained the given set of observation vectors. Someelementary variables combine together positively to give each of the factor. A factorcan either be present, in which case it has a positive effect or the factor can be absent,in which case the factor has null inﬂuence. Clearly, there is no room for a “negative”inﬂuence and hence it often makes sense to constrain the factors to be non-negative. ULTI - LEVEL SVM BASED CAD TOOL FOR CLASSIFYING STRUCTURAL MRI s 7

SVM is a widely used supervised machine learning algorithm for classiﬁcation de-veloped by Vladimir N. Vapnik and the current standard incarnation (soft margin)was proposed by Vapnik and Corinna Cortes in 1995 [18]. An in-depth discussion onSVM classiﬁcation can be found in [19]. In our proposed multi-level classiﬁcation,two support vector models are created using NMF features and Haralick features.Now, given a feature vector Φ ( −→ x ) , in the feature space Φ ( . ) , we compute a scorebased on its distance from the decision boundary hyperplane f ( x ) as,[20]d ist ( Φ ( x ) , f ( x )) = f ( x ) ∑ i ∈ SV | y i α i Φ ( x ) | (18)where y i ∈ {− , } , α i : vector weights for support vectors.The scores provide an estimate of how good the classiﬁcation is. Larger the score,larger will be the distance from the hyperplane and hence higher will be probabilityof the sample to lie in that class [20]. In our work, for a given test sample, wecompute the scores for both the support vector models. The model which givesthe highest absolute value for the score is assumed to have classiﬁed the samplecorrectly. This approach improved the accuracy of the classiﬁcation. Three performance metrics were used in this work:1. Sensitivity ( SN ), which is the measure of the system’s ability to identify strokeMRIs.2. Speciﬁcity ( SP ), which is a measure of the system’s ability to identify non-strokeMRIs.3. Accuracy ( AC ), which is a measure of the system’s net classiﬁcation efﬁciency.Before deﬁning these metrics mathematically, we introduce the following terms: • T P : True Positive, stroke MRI identiﬁed as stroke MRI. • T N : True Negative, non-stroke MRI identiﬁed as non-stroke MRI. • FP : False Positive, non-stroke MRI identiﬁed as stroke MRI. • FN : False Negative, stoke MRI identiﬁed as non-stroke MRI.Now, we can mathematically deﬁne sensitivity, speciﬁcity and accuracy as: SN = T PT P + FN (19) SP = T NT N + FP (20) Jerrin Thomas Panachakel and Jeena R.S. AC = T P + T NT N + T P + FP + FN (21)The confusion matrix used in this work is shown in Fig. 3. Fig. 3

Sample confusion matrix

LOOCV (Leave-One-Out Cross Validation), which is a special case of LpOVC(Leave-p-Out Cross Validation) was used for cross validation. The steps adoptedwere,1.

Step 1:

Out of the N samples with ground truth, N − T P , T N , FP or FN was incremented.2. Step 2:

In the next iteration, a different sample was chosen as the test data andStep 1 is repeated.3.

Step 3:

This process is continued until all the samples have been used as a testdata once.

Table 1

Classiﬁcation results when Haralick features alone is usedLinear MLP RBFSensitivity 71.43 78.57 71.00Speciﬁcity 87.50 81.25 81.25Accuracy 80.00 80.00 76.67

TABLE 1 and II shows the result of using NMF and Haralick features indepen-dently for classiﬁcation using different kernel functions. For MLP, the parameter

ULTI - LEVEL SVM BASED CAD TOOL FOR CLASSIFYING STRUCTURAL MRI s 9(a) Confusion matrix when NMF weight vector alone is used asthe features(b) Confusion matrix when Haralick features alone is used

Fig. 4

Confusion matrices for various features considered individually.

Table 2

Classiﬁcation results when NMF weight vector alone is used as the featuresLinear MLP RBFSensitivity 78.57 64.29 71.43Speciﬁcity 87.50 87.50 81.25Accuracy 83.33 76.67 76.67

Table 3

Comparison of multi-level SVM with simple SVMHaralick NMF Concatenated Multi-LevelSensitivity 78.57 78.57 64.29

Speciﬁcity 81.25 87.50 81.25

Accuracy 80.00 83.33 73.33

Fig. 5

Confusion matrices for various features considered simultaneously.

Fig. 6

Comparison of multi-level SVM with simple SVM was [ − ] and for RBF, the sigma value was 40. These values were found to givethe best results. ULTI - LEVEL SVM BASED CAD TOOL FOR CLASSIFYING STRUCTURAL MRI s 11

By using the proposed multi-level classiﬁcation, we could improve the accuracy,sensitivity and speciﬁcity rather than by just concatenating the features. The resultsare shown in TABLE III, Fig. 5 and Fig. 6.

In this paper, we have proposed a novel CAD tool that can classify the struc-tural MRI images of patients who had stroke from the images of patients suffer-ing from other neurological disorders. By incorporating a multi-level classiﬁcationsystem, we have achieved a classiﬁcation efﬁciency of more than 83% by usingNon-negative matrix factorisation and Haralick features as the features and SVMas the classiﬁer. This can be further improved by incorporating other features. Webelieve that this can be extended to a CAD systems for predicting CVAs.

Acknowledgement

The authors place on record their deepest gratitude to Dr. Shine M. Babu, AssistantProfessor, Dr. Somervell Memorial CSI Medical College & Hospital, Trivandrumand Dr. Benedict Bright, Medical Ofﬁcer, CSI Mission Hospital, Trivandrum for thesupport they extended to this work.

References [1] Saver, J.L., Starkman, S., Eckstein, M., Stratton, S., Pratt, F., Hamilton, S.,Conwit, R., Liebeskind, D.S., Sung, G., Sanossian, N.: Methodology of theﬁeld administration of stroke therapy–magnesium (FAST-MAG) phase 3 trial:Part 1–rationale and general methods. International Journal of Stroke (2014)215–219[2] Organization, W.H., et al.: The top 10 causes of death. July 2013. Availableat: who. int/mediacentre/factsheets/fs310/en/.[Last accessed: October 2013](2014)[3] Nagi, D., Aamar, A., Rahim, I., Sadaf, M.I., Butt, Z., Ahsan, I., Abbas, F., Wa-hab, A.: Clinical audit of patients with cerebrovascular accident and transientischemic attack in Our Lady of Lourdes Hospital Drogheda, Ireland. Annalsof King Edward Medical University (2014)[4] Prasad, K., Vibha, D., et al.: Cerebrovascular disease in South Asia–Part I: Aburning problem. JRSM cardiovascular disease (2012) 20[5] Sridharan, S.E., Unnikrishnan, J., Sukumaran, S., Sylaja, P., Nayak, S.D.,Sarma, P.S., Radhakrishnan, K.: Incidence, types, risk factors, and outcome of stroke in a developing country the Trivandrum stroke registry. Stroke (2009) 1212–1218[6] Chauhan, S., Aeri, B.T.: Prevalence of cardiovascular disease in India and it iseconomic impact-A review. International Journal of Scientiﬁc and ResearchPublications (2013) 212[7] Potluru, V.K., Calhoun, V.D.: Group learning using contrast NMF: Applicationto functional and structural MRI of schizophrenia. In: Circuits and Systems,2008. ISCAS 2008. IEEE International Symposium on, IEEE (2008) 1336–1339[8] Padilla, P., L´opez, M., G´orriz, J.M., Ramirez, J., Salas-Gonzalez, D., ´Alvarez,I.: NMF-SVM based CAD tool applied to functional brain images for thediagnosis of alzheimer’s disease. Medical Imaging, IEEE Transactions on (2012) 207–216[9] Haralick, R.M., Shanmugam, K., Dinstein, I.H.: Textural features for imageclassiﬁcation. Systems, Man and Cybernetics, IEEE Transactions on (1973)610–621[10] Paatero, P., Tapper, U.: Positive matrix factorization: A non-negative factormodel with optimal utilization of error estimates of data values. Environ-metrics (1994) 111–126[11] Ill´an, I., G´orriz, J., Ram´ırez, J., Salas-Gonzalez, D., L´opez, M., Segovia, F.,Chaves, R., G´omez-Rio, M., Puntonet, C.G.: 18F-FDG PET imaging analysisfor computer aided alzheimer’s diagnosis. Information Sciences (2011)903–916[12] Panachakel, J.T.: Contourlet transform and iterative noise free ﬁltering basedbilayer ﬁlter for enhancing echocardiogram. In: Green Technologies (ICGT),2012 International Conference on, IEEE (2012) 301–306[13] Panachakel, J.T., George, M.: Bilayer denoising of echocardiography imagesusing contourlet transform and iterative noise-free ﬁltering. International Jour-nal of Applied Engineering Research (2012)[14] Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In:Advances in Neural Information Processing systems. (2001) 556–562[15] Donoho, D., Stodden, V.: When does non-negative matrix factorization give acorrect decomposition into parts? In: Advances in neural information process-ing systems. (2003) None[16] Berry, M.W., Browne, M., Langville, A.N., Pauca, V.P., Plemmons, R.J.: Al-gorithms and applications for approximate nonnegative matrix factorization.Computational Statistics & Data Analysis (2007) 155–173[17] Tropp, J.A.: Literature survey: Nonnegative matrix factorization. Universityof Texas at Asutin (2003)[18] Cortes, C., Vapnik, V.: Support-vector networks. Machine learning (1995)273–297[19] Burges, C.J.: A tutorial on support vector machines for pattern recognition.Data mining and knowledge discovery (1998) 121–167 ULTI - LEVEL SVM BASED CAD TOOL FOR CLASSIFYING STRUCTURAL MRI s 13s 13