Medical Image Classification via SVM using LBP Features from Saliency-Based Folded Data
TTo appear in proceedings of The 14th International Conference on Machine Learning and Applications (IEEE ICMLA’15), Miami, Florida, USA, 2015.
Medical Image Classification via SVMusing LBP Features from Saliency-Based Folded Data
Zehra C¸ amlica , H.R. Tizhoosh and Farzad Khalvati KIMIA Lab, University of Waterloo, Canada, [ [email protected] ] Sunnybrook Health Sciences Centre, Toronto, Canada
Abstract — Good results on image classification and retrievalusing support vector machines (SVM) with local binary patterns(LBPs) as features have been extensively reported in the literaturewhere an entire image is retrieved or classified. In contrast, inmedical imaging, not all parts of the image may be equallysignificant or relevant to the image retrieval application athand. For instance, in lung x-ray image, the lung region maycontain a tumour, hence being highly significant whereas thesurrounding area does not contain significant information frommedical diagnosis perspective. In this paper, we propose to detectsalient regions of images during training and fold the data toreduce the effect of irrelevant regions. As a result, smaller imageareas will be used for LBP features calculation and consequentlyclassification by SVM. We use IRMA 2009 dataset with 14,410 x-ray images to verify the performance of the proposed approach.The results demonstrate the benefits of saliency-based foldingapproach that delivers comparable classification accuracies withstate-of-the-art but exhibits lower computational cost and storagerequirements, factors highly important for big data analytics.
Keywords — Image classification, saliency, folding, local binarypatterns, support vector machines
I. I
NTRODUCTION
Recent advances in medical imaging devices have led tothe generation of big image data on a daily basis. The mainpurpose of medical information system is the acquisition ofnecessary information to provide high-quality care throughaccurate and efficient diagnosis and treatment planning [1].In order to implement advanced information systems op-erating on large databases (hence handling big image data),suitable methods are required to respond to a query (an imageselected by a clinician) by retrieving images that have similarcharacteristics. Content-based image retrieval (CBIR) uses im-age search techniques that incorporate visual features, such ascolor, texture, and shape, in order to respond to user’s queries.In medical imaging context, CBIR can enormously contributeto more reliable diagnosis, among others, by classifying thequery image and retrieving similar images already annotatedby diagnostic descriptions and treatment results.The main purpose of this work is to obtain high classi-fication score with less computational complexity and lowerstorage requirements. In order to save time and to gain highclassification score, first a salient region detector is used [2].Next, images are folded to mainly contain salient areas, andreduce the effect of irrelevant (non-salient) regions. Subse-quently, we can extract LBP features from folded images andclassify them via SVM. We use IRMA x-ray dataset with14,410 images for training and testing. The classification result is computed with reported ImageCLEF error score evaluationsfor different methods [3].This paper is organized as follows: In section II, a briefbackground review on medical image retrieval is given. Insection III we describe the proposed approach. Section IVreports the experimental results using IRMA dataset. SectionV concludes the paper.II. L
ITERATURE R EVIEW
There is a clear demand for fast and accurate image searchtechnologies in clinical settings when physicians (e.g. radi-ologists) desire to search for similar images of all patientsin the past when examining a current patient. Content-basedimage retrieval (CBIR) has been subject to research to satisfysome aspects of this demand. CBIR takes advantage of visualcontents of an image such as colour, shape and texture tosearch for (similar) images in large archives. Generally, asoftware system that can access medical archives to search forsimilar images is a CBIR system. The “content-based” aspectof CBIR simply means that the search is conducted based onsome visual (pictorial) features of the image, and not basedon text annotations (the latter is mainly used when we searchon the internet). Some examples for medical CBIR systemsare TELEMED [4], ASSERT [5] and IRMA [6].The features used in CBIR can be textual or visual. Recentmedical image retrieval systems increasingly rely on visualfeatures that could be low-level features (primitive), middle-level features (logical), and high-level features (abstract).Almost all early CBIR systems are based on low-level features(colour or shape), but recently, mid- and high-level imagerepresentations have received more attention. Mid-level fea-tures are obtained from particular parts of the image, whichare important regions with significant details [7], [8], [2], [9].High-level features are represented with semantic design. Thesemantic design (high-level features such as emotions, objectsand events) can be present in visual or textual information.Local binary patterns (LBP) are utilized as features for tex-ture description [10]. LBP descriptors are commonly used infacial expression analysis and recognition [11], [12], [13]. LBPmeasures invariant texture of gray-scale images with utilizationof local neighborhoods. The basic LBP operator replaces pixelvalues with labels by binarizing × neighborhoods aroundeach pixel with the centre pixel as a threshold. Pixel labels arethen converted to decimal numbers. Because LBP is an easy-to-compute feature extraction method, it has been successfully1 a r X i v : . [ c s . C V ] S e p o appear in proceedings of The 14th International Conference on Machine Learning and Applications (IEEE ICMLA’15), Miami, Florida, USA, 2015. Algorithm 1
Proposed Approach ——- Pre-Processing ——- Read all images I i Calculate saliency template S ∗ I Fi ← Apply folding on all images I i Save S ∗ , and all folded images I Fi ——- Training ——- Read folded images I Fi Set number of classes N C Extract LBP features from folded data
Train SVM to generate the support vectors v , v , . . . Save v , v , . . . ——- Online Classification ——- Read the query image I q Read the saliency template S ∗ Read the support vectors v , v , . . . I qS ← Apply the saliency template S ∗ on I q I qF ← Apply folding F on I qS Extract LBP features from I qF Classify the query using SVMused in many studies such as face recognition and imageannotation [11], [14], [15], [16], [17]. In the proposed method,LBP is applied to multi-block patches in the image at differentscales. After labeling the image parts, the feature histogramis extracted from the local region labels. The regions can berectangular, circular or triangular. Recently, a new approach tobinary encoding of local image information is proposed whichuses “barcodes” based on thresholding projections via Radontransform [18].Different methods can be used to classify images [19], [20],[21], [22]. For our classification, we use SVM in this paper,which is a supervised learning method to classify datasets.It investigates sets of feature vectors in an N dimensionalspace. It uses support vectors to construct a hyperplane toseparate different classes by maximizing the margin betweenthem defined by the given hyperplane [23].III. T HE P ROPOSED A PPROACH
In this section, we present the proposed image classifica-tion method. This approach comprises of a pre-processingphase, offline training and an online usage phase. During pre-processing, saliency maps are extracted and images are folded.SVM is trained using LBP features of both folded and notfolded images in the offline phase. Finally, online classificationis described. Algorithm 1 gives a generic overview of theproposed approach.
A. Preprocessing
The pre-processing of image data mainly consists of twoprocedures. The first procedure creates a saliency template,and the second procedure formulates the image folding basedon the saliency template.
Algorithm 2
Pre-Processing Stage: Saliency Template S ∗ N C ← number of classes; i = 1 . Initialize saliency template S ∗ = [] while i < N C do Calculate the saliency map S i for image I i [2] S ∗ ← S ∗ + S i i ← i + 1 end while S ∗ ← S ∗ N C
1) Saliency Map:
The detection of salient regions of animage is crucial to extract effective information. We proposeto create a saliency template by averaging all saliency mapswhich are detected by context-aware saliency algorithm [2].The context-aware saliency algorithm detects image regionsthat best represent the “scene”. It is a detection algorithm,as its authors state, “based on four principles observed inthe psychological literature: local low-level considerations,global considerations, visual organizational rules, and high-level factors”. Local low-level factors (such as contrast andcolor), global calculations suppressing frequently-occurringfeatures, visual organization rules (visual forms may possessone or several centres of gravity) and high-level factors (suchas priors on the salient object location and object detection)are considered by the algorithm. The implementation of thisalgorithm is available on the authors’ website .Saliency maps of all training images are generated andaveraged to calculate a saliency template S ∗ (Algorithm 2).Figure 1 shows three images, their saliency maps, and thesaliency template created by averaging all saliency maps. Theaverage of saliency maps is first calculated internally withineach class, then the average is taken across all classes.The salient, less salient and not salient areas are definedfor training data by dividing images to N sub-blocks. Then,based on the saliency template, the folding is applied. The newimages with reduced area can now be used for local patternanalysis.
2) Image Folding:
Folding the rectangular region A ⊂ I within image I resulting in an image I (cid:48) ⊂ I can be giventhrough I (cid:48) = A + I \ A whereas the sign “ \ ” denotes theset-theoretical subtraction. The main purpose of folding is toreduce image area without loosing information but reducingthe dimensionality of features (see Fig. 2). The folding stepsare described in Algorithm 3. B. Offline Training
LBP features are extracted from K ( M > K ) divided sub-blocks of image with different scaling factors (1 and 2). LBPfeature vector for an image has 1,062 dimensions with thefollowing condition: M = 4 × , K = 3 × . The LBPhistogram features from training data are used to train multi-class SVM [24] to classify images. The SVM kernel type is http://webee.technion.ac.il/labs/cgm/Computer-Graphics-Multimedia/Software/Saliency/Saliency.html o appear in proceedings of The 14th International Conference on Machine Learning and Applications (IEEE ICMLA’15), Miami, Florida, USA, 2015.Fig. 2 . Schematic illustration of saliency maps and image folding: The input image (left image) is processed to find a salient region (middle image).Subsequently, non-salient regions (right image, gray stripes) are marked to be folded inwardly.Fig. 1 . A saliency map is generated for each available training image. Asaliency template is then assembled by combining all saliency maps. set to be Radial Basis Function.
C. Online Classification
In online part, an image query is selected from IRMA [6]test database and LBP features are calculated for the saliency-
Algorithm 3
Pre-Processing Stage: Image Folding Set number of blocks M ( = N × N = 4 × ) Read saliency tempalte S ∗ Read the input image I while not all combinations tested do Align two columns Take the summation of all pixel values in S ∗ Keep s c i max (maximum value of summed columns) Update s columnmax ← (cid:80) i s c i max . end while while not all combinations tested do Align two rows
Take the summation of all pixel values in S ∗ Keep s r j max (maximum value of summed rows) Update s rowmax ← (cid:80) i s r j max . end while Find the folding F best that satisfies s = min( s columnmax , s rowmax ) . Apply the folding F best to I .based folded image as new images are encountered. Next,SVM classification is performed with LBP features. We alsorun the experiments for the LBP-SVM without folding.IV. E XPERIMENTS AND R ESULTS
A. Data Set
The Image Retrieval in Medical Applications (IRMA )database is a collection of 14,410 x-ray images that have beenrandomly collected from daily routine work at the Departmentof Diagnostic Radiology of the RWTH Aachen University .The downscaled images were collected from different ages,genders, view positions, and pathologies [3].Each image in the dataset has an IRMA code. Accordingto these codes, 193 classes are defined. The IRMA code com-prises four axes with three to four positions each: 1) the tech-nical code (T) (modality), 2) the directional code (D) (bodyorientations), 3) the anatomical code (A) (body region), and http://irma-project.org/ o appear in proceedings of The 14th International Conference on Machine Learning and Applications (IEEE ICMLA’15), Miami, Florida, USA, 2015.
4) the biological code (B) (the biological system examined).The complete IRMA code consists of 13 characters TTTT-DDD-AAA-BBB, with each character in { , . . . , a, . . . , z } .As many as 12,677 images are separated for training. Theremaining 1,733 images are used as test data.Figure 3 shows some samples images from IRMA datesalong with their corresponding IRMA codes. B. Error Measurement
The ImageCLEF project has defined an error score evalua-tion method in order to evaluate the classification performanceof methods on IRMA dataset [3]. As in IRMA dataset allimages are labelled with the technical, directional, anatomicaland biological independent axes, the error E can be definedas follows E = n (cid:88) i =1 b i i δ ( I i , ˆ I i ) (1)where b i is number of possible labels at position i and δ is the decision function delivering for wrong label and for correct label when the IRMA codes of the image I i iscompared with the IRMA code of the image ˆ I i . For everyaxis, the maximal possible error is computed and the errorsare normalized between . and . If all positions in all axesare wrong, error value is . C. Classification Error
The experiments resulted in an error score of . forthe proposed method of SVM image classification with multi-scale LBP on saliency-based folded image. If images are notfolded, the SVM error slightly decreases to . . This slightdecrease in error comes with a higher cost in computation;the dimensions of features are twice the dimensions of thefolded image. This means that the accuracy does not fall whiletime and computational cost are decreasing. Saliency-basedfolding reduces complexity without loosing important patternsin salient region. The computational complexity decreasesbecause folding reduces the feature vector dimension.Without consideration of salient area, folding was tried indifferent directions. The error is clearly increased. apparently,saliency template plays a crucial role in deciding how to foldan image.For sake of comparison, the IRMA dataset was used inImageCLEF 2009 competition with 2008 IRMA code andbasic LBP with × multi-blocks is applied in [17] and theerror score is reported as . [17]. In addition, the lowesterror score in ImageCLEF 2009 with 2008 IRMA code is . [3]. The comparison of classifiers and SVM results areoutlined in Table 1. D. Memory and Time
The image area is reduced by with saliency-based fold-ing. As an effect, the number of feature dimension decreasedfrom 1,888 to 1,062 which equals decrease in featuredimensionality. SVM needs . seconds training time and . secondstesting time without saliency-based folding. That correspondsto 53 milliseconds per image for online queries.In contrast, with saliency-based folding SVM only needs . seconds training time and . seconds testing time.That corresponds to milliseconds per image for onlinequeries. To neglect the overhead for the saliency calculations,and only by looking at the testing times (online execution),using the proposed approach accelerates the classificationprocess by roughly when looking at online computationtimes per query. Method Error t (ms)/imageMS × LBP/SVM .
55 53 MS × LBP/SVM w. folding .
07 30
TAU [3] . -VPASabanci [17] . -Table 1 . Image classification results (MS n × n = n × n multi scale, t = time),results of TAU and VPASabanci as reported in literature. V. C
ONCLUSIONS
Content-based image retrieval (CBIR) depends on goodclassification first to assign a query to a the right imagecategory. The time requirements become paramount honedealing with big data.The proposed medical image classification using saliency-based folding method appears to be an effective method whensupport vector machines and local binary patterns are em-ployed. Folding non-salient (non-relevant) parts of the imagemay result in slight increase of classification error. That maybe expected since folding areas overlap with salient regionsresulting in slight distortion. However, the proposed approachdoes accelerate the online classification, an advantage thatmight be crucial for big image data (reduction from millisecond per image to milliseconds corresponding to acceleration).The decision how to fold image blocks is the most crit-ical part of the pre-processing. Different approaches can beexamined in future work to investigate the feasibility and thepotential effect of folding blocks and not necessarily just fold-ing rows and columns. As well, one may consider the deletionof non-salient blocks altogether. This may be particularly ofinterest in non-medical cases where the scene may containirrelevant information along with objects of interest.As a potential future work, one may also investigate the in-corporation of the new barcode technology [18] into retrieval-oriented classification combined with optimization techniquesthat employ the concept of opposite entities [25], [26], [27].4 o appear in proceedings of The 14th International Conference on Machine Learning and Applications (IEEE ICMLA’15), Miami, Florida, USA, 2015.(a) 1121-127-700-500 (b) 1121-120-918-700 (c) 1121-120-942-700 (d) 112d-121-500-000 (e) 1123-127-500-000(f) 1121-120-200-700 (g) 1121-200-412-700 (h) 1121-110-414-700 (i) 1121-240-442-700 (j) 1121-220-310-700Fig. 3 . Sample images from IRMA Dataset with their IRMA codes TTTT-DDD-AAA-BBB. R EFERENCES[1] Nazanin Sabooniha, Danny Toohey, and Kevin Lee, “An evaluation ofhospital information systems integration approaches,” in
Proceedingsof the Int. Conf. on Advances in Computing, Communications andInformatics . 2012, pp. 498–504, ACM.[2] Stas Goferman, Lihi Zelnik-Manor, and Ayellet Tal, “Context-awaresaliency detection.,”
IEEE Trans. Pattern Anal. Mach. Intell. , vol. 34,pp. 1915–1926.[3] Henning M¨uller, Paul Clough, Thomas Deselaers, and Barbara Caputo,
ImageCLEF: Experimental Evaluation in Visual Information Retrieval ,2010.[4] R. Currell, C. Urquhart, P. Wainwright, and R. Lewis, “Telemedicineversus face to face patient care: effects on professional practice andhealth care outcomes,” vol. 97, no. 35, pp. 35+, Aug. 2001.[5] Chi-Ren Shyu et al., “ASSERT: A Physician-in-the-Loop Content-BasedRetrieval System for HRCT Image Databases,”
Computer Vision andImage Understanding , vol. 75, pp. 111–132, 1999.[6] Payel Ghosh, Sameer Antani, L. Rodney Long, and George R. Thoma,“Review of medical image retrieval systems and future directions.,” in
CBMS , 2011, pp. 1–6.[7] Mei-Ling Shyu, Shu-Ching Chen, Min Chen, Chengcui Zhang, andKanoksri Sarinnapakorn, “Image database retrieval utilizing affinityrelationships,” in
Proceedings of the 1st ACM International Workshopon Multimedia Databases . 2003, MMDB ’03, pp. 78–85, ACM.[8] Yongqing Sun and Shinji Ozawa, “A hierarchical approach for region-based image retrieval.,” in
SMC (1) , 2004, pp. 1117–1124.[9] ByoungChul Ko, Soo Yeong Kwak, and Hyeran Byun, “Svm-basedsalient region(s) extraction method for image retrieval.,” in
ICPR (2) ,2004, pp. 977–980.[10] Subrahmanyam Murala and Q. M. Jonathan Wu, “Expert content-basedimage retrieval system using robust local patterns,”
J. Vis. Comun. ImageRepresent. , vol. 25, no. 6, pp. 1324–1334, Aug. 2014.[11] Timo Ahonen, Abdenour Hadid, and Matti Pietikainen, “Face descrip-tion with local binary patterns: Application to face recognition,”
IEEETrans. Pattern Anal. Mach. Intell. , vol. 28(12), pp. 2037–2041, 2006.[12] Guoying Zhao and Matti Pietikainen, “Dynamic texture recognitionusing local binary patterns with an application to facial expressions,”
IEEE Trans. Pattern Anal. Mach. Intell. , vol. 29(6), pp. 915–928, 2007.[13] Daniel Maturana, Domingo Mery, and ´Alvaro Soto, “Face recognitionwith local binary patterns, spatial pyramid histograms and naive bayesnearest neighbor classification,” in
International Conference of theChilean Computer Science Society , 2009, pp. 125–132.[14] Baochang Zhang, Yongsheng Gao, Sanqiang Zhao, and Jianzhuang Liu,“Local derivative pattern versus local binary pattern: Face recognition with high-order local pattern descriptor,”
Trans. Img. Proc. , vol. 19, no.2, pp. 533–544, Feb. 2010.[15] P. Jonathon Phillips, Hyeonjoon Moon, Patrick Rauss, and Syed A.Rizvi, “The feret evaluation methodology for face-recognition algo-rithms,” in
Conference on Computer Vision and Pattern Recognition ,1997, CVPR ’97, pp. 137–143.[16] Timo Ojala, Matti Pietik¨ainen, and Topi M¨aenp¨a¨a, “Multiresolutiongray-scale and rotation invariant texture classification with local binarypatterns,”
IEEE Trans. Pattern Anal. Mach. Intell. , vol. 24, no. 7, pp.971–987, July 2002.[17] Devrim Unay, Octavian Soldea, S¨ureyya ¨Oz¨og¨ur-Aky¨uz, M¨ujdat C¸ etin,and Ayt¨ul Erc¸il, “Medical image retrieval and automatic annotation:VPA-SABANCI at imageclef 2009,” in
Working Notes for CLEF 2009Workshop , 2009.[18] H.R.Tizhoosh, “Barcode annotations for medical image retrieval,” in proceedings of IEEE International Conference on Image Processing,ICIP 2015, Quebec City, Canada , 2015.[19] Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, T. Huang, and YihongGong, “Locality-constrained linear coding for image classification,” in
Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Confer-ence on , 2010, pp. 3360–3367.[20] A.A. Othman and H.R. Tizhoosh, “Image classification using evolvingfuzzy inference systems,” in , 2013, pp. 1435–1438.[21] A. Bosch, A. Zisserman, and X. Muoz, “Image classification usingrandom forests and ferns,” in
Computer Vision, 2007. ICCV 2007. IEEE11th International Conference on , 2007, pp. 1–8.[22] E.M. Arvacheh and H.R. Tizhoosh, “Pattern analysis using zernike mo-ments,” in
Proceedings of the IEEE Instrumentation and MeasurementTechnology Conference, IMTC 2005 , 2005, vol. 2, pp. 1574–1578.[23] Jun Wu, Mingyu Lu, and Chun-Li Wang, “Enhancing svm activelearning for image retrieval using semi-supervised bias-ensemble.,” in
ICPR . 2010, pp. 3175–3178, IEEE.[24] Chih-Chung Chang and Chih-Jen Lin, “LIBSVM: A library for sup-port vector machines,”
ACM Transactions on Intelligent Systems andTechnology , vol. 2, pp. 1–27, 2011.[25] M. Ventresca and H.R. Tizhoosh, “Simulated annealing with oppositeneighbors,” in
Foundations of Computational Intelligence, 2007. FOCI2007. IEEE Symposium on , 2007, pp. 186–192.[26] S. Rahnamayan and H.R. Tizhoosh, “Image thresholding using microopposition-based differential evolution (micro-ODE),” in
IEEE Congresson Evolutionary Computation , 2008, pp. 1409–1416.[27] M. Ventresca and H.R. Tizhoosh, “Opposite transfer functions andbackpropagation through time,” in
IEEE Symposium on Foundationsof Computational Intelligence, FOCI 2007. , 2007, pp. 570–577., 2007, pp. 570–577.