Classification with 2-D Convolutional Neural Networks for breast cancer diagnosis
CClassification with 2-D Convolutional Neural Networksfor breast cancer diagnosis
Anuraganand Sharma
School of Computing Information & Mathematical SciencesThe University of the South PacificSuva, Fijisharma [email protected]
Dinesh Kumar
Faculty of Science & TechnologyUniversity of Canberra, Canberra, ACT, [email protected]
Abstract
Breast cancer is the most common cancer in women. Classification of cancer/non-cancer patients with clinical records requires high sensitivity and specificityfor an acceptable diagnosis test. The state-of-the-art classification model -Convolutional Neural Network (CNN), however, cannot be used with clin-ical data that are represented in 1-D format. CNN has been designed towork on a set of 2-D matrices whose elements show some correlation withneighboring elements such as in image data. Conversely, the data exam-ples represented as a set of 1-D vectors – apart from the time series data– cannot be used with CNN, but with other classification models such asArtificial Neural Networks or RandomForest. We have proposed some novelpreprocessing methods of data wrangling that transform a 1-D data vector,to a 2-D graphical image with appropriate correlations among the fields tobe processed on CNN. We tested our methods on Wisconsin Original BreastCancer (WBC) and Wisconsin Diagnostic Breast Cancer (WDBC) datasets.To our knowledge, this work is novel on non-image to image data transfor-mation for the non-time series data. The transformed data processed withCNN using VGGnet-16 shows competitive results for the WBC dataset andoutperforms other known methods for the WDBC dataset.
Keywords:
Convolutional Neural Networks, preprocessing, data wrangling,
Preprint submitted to arXiv.org July 30, 2020 a r X i v : . [ c s . C V ] J u l mage classification
1. Introduction
In recent times, there are growing interest in the development of machinelearning (ML) models for medical datasets due to the advancements in dig-ital technology and improvements in data collection methods. Increasingly,several ML-based systems have been designed as an early warning or diag-nostic tool for chronic illnesses, for example diagnosing depression, diabetesand cancer [1]. Breast cancer is arguably one of the deadliest forms of canceramongst women with millions of reported cases around the world of whichmany cases become fatal [2, 3]. Breast cancer is caused by abnormal growthof some of the breast cells in the lining of the milk glands or ducts of thebreast (ductal epithelium) [4, 5]. Compared to healthy cells, these cells dividemore rapidly and accumulate, forming a lump or mass. At this stage, thecells become malignant and may spread through the breast to lymph nodesor other parts of the body.
The study of breast cancer has attracted considerable attention in the pastdecades. Improving data collection and storage technologies has resulted invarious types and amounts of data collected on breast cancer from aroundthe world. These include data on Ribonucleic Acid (RNA) signatures forcell mutations that cause breast cancer [6, 7], mammogram images [8, 9] anddata on symptoms and diagnosis [10]. Many traditional Computer-AidedDiagnosis (CADx) systems require hand-crafted feature extraction which isa challenging task [11, 12]. Even conventional ML techniques require theextraction of an optimal set of features manually prior to model training.An extensive review on various feature selection and extraction techniquescan be found in [13, 14]. Some commonly used approaches for ML modelsare Principal Component Analysis (PCA) [15], information gain [16], GA-based feature selection [17], recursive feature elimination (RFE) [18], meta-heuristic methods [19] and rough sets [20]. Feature selection and extraction,therefore, is an important consideration in the pre-processing step beforeapplying any ML algorithm such as decision trees, Bayesian models, SupportVector Machines (SVM) and Artificial Neural Networks (ANN). The behavior2f ML algorithms and their prediction accuracy is influenced by the choice offeatures selected [21, 22]. Many times manual feature extraction or knowledgeof domain experts is needed to have a good understanding on the relevanceof the attributes [23].
To address these issues surrounding the use of conventional ML algo-rithms has propelled the need for new approaches and methods to automat-ically extract features from large datasets. As a result, Deep Learning (DL)algorithms such as Convolutional Neural Network (CNN or ConvNet) andRecurrent Neural Networks (RNNs) have emerged in recent times that canaccept raw data and are automatically able to discover patterns in them[24, 25].CNN is one of the most popular algorithms for deep learning which ismostly used for image classification, natural language processing, and timeseries forecasting. Its ability to extract and recognize the fine features hasled to the state-of-the-art performance in various application domains suchas computer vision, image recognition, speech recognition, natural and lan-guage processing [26, 27, 28]. CNN is an enhancement of a canonical NeuralNetworks architecture that is specifically designed for image recognition in[29]. Since then many variations have been added to the architecture ofCNN to enhance its ability to produce remarkable solutions for deep learn-ing problems such as AlexNet [26], VGG Net [27] and GoogLeNet [30]. CNNeliminates the need for manual feature extraction because the features arelearned directly by different convolutional layers [31, 26]. It does not require aseparate feature extraction strategy which requires domain expert and otherpreprocessing techniques where complete features may still not be extracted[32]. Despite its huge success with image data, CNN is not designed to han-dle non-image data in non-time series form. Arguably, any problem thatcan represent the correlation of features of a given data example in a singlemap, maybe attempted via CNN.CNNs have proven to work best on data that are in 2-D form, such asimages and audio spectrograms [33]. This is attributed to the fact that theconvolution technique in CNN requires data examples to have at least two All future referencing of non-image data are in non-time series form unless otherwisespecified.
The main motivation for this paper is to realize the potential of CNNfor non-image clinical data for breast cancer because it eliminates the needfor manual feature extraction. The features are learned directly by CNNwhereby it also produces state-of-the-art recognition results [43]. The keydifference between traditional ML and DL is in how features are extracted.Traditional ML approaches use handcrafted engineering features by applyingseveral feature extraction algorithms and then apply the learning algorithms.On the other hand, in the case of DL, the features are learned automaticallyand are represented hierarchically at multiple levels. This is the strong pointof DL against traditional machine learning approaches [43].
We have proposed some novel methods to transform non-image clinicaldata of breast cancer to 2-D feature map images in R so that a large set ofthese kinds of data are not deprived of the services of CNN. This would alsoencourage other variations and/or methods for text to image transformationto be developed in the future. The scope of this paper is to broaden theusage of CNN to those applications where d -dimensional raw data has setof N , 1-D data vectors in R as shown in Figure 1. Each row represents4 igure 1: Snapshot of data file for Breast Cancer dataset WBC from [10] a 1-D data vector with d elements where d, N ≥
1. It is a sample of aWisconsin Original Breast Cancer dataset (WBC) used in the experiments.This dataset from UCI [10] is a record of medical examination of patientsto diagnose breast cancer, where each row is a 1-D vector representing anumerical data example. We demonstrate our method of non-image breastcancer data transformation to image data – processed in CNN – producesexceptional results for classification accuracy. Some research demonstratesthe use of 1-D convolutions on 1D datasets such as data in the form of signalsand time sequences [44]. Though this provides a possibility of using 1-Dconvolutions in this research, our experiments revealed their unsuitability onour experimental datasets. Having applied the data in its raw form into 1-DCNN gave highly unpredictable results.This paper is organized as follows: Section 2 briefly describes the generalarchitecture of CNN. Section 3 describes our three proposed methods of datawrangling from non-image Breast Cancer data [10] to image data. Section4 describes the complete methodology of the classification of breast cancerdata with CNN. Section 5 shows the experimental results and Section 6discusses the outcome of the experiments. Lastly, Section 7 concludes thepaper by summarizing the results and proposing some further extensions tothe research.
2. Convolutional Neural Networks
A convolutional neural network (CNN or ConvNet) is a deep learning al-gorithm designed for computer vision. Its architecture is based on backprop-agation artificial neural networks [29]. It takes an input image whose eachpixel represents input data that goes through a series of the feature selectionprocess through convolution which is later sent to the weighted perceptronswhere the learning happens through backpropagation. The major advantageof CNN is its ability to learn the features by itself while in the canonical neu-5al networks feature selection is a separate process where the final accuracyof the model depends on the choice of preprocessing and feature selectionmethods [45, 46]. CNN has become a prominent deep learning model with aplethora of literature available on its structure and functionality, however, abrief description of individual layers of CNN is given below.
This layer is a feature extraction layer for CNN which means any addi-tional domain-specific feature selection preprocessing is not required. Thislayer can be divided into 3 sublayers:
This layer directly accepts raw images as input where a set of small filtersis convolved over the image to produces one or more feature maps [47, 48].Convolution happens through sliding the filter across the image while com-puting the dot product of elements of the filter and image [49]. This processresults in the extraction of certain features from the image [50].
The results of the convolutional layer are passed through an activationfunction to produce a bounded output. CNN generally uses the Rectifiedlinear unit (ReLU) that converts negative values to 0. It also trains thenetwork several times faster than its counterparts such as tanh [26].
This layer does the downsampling that also reduces the input size alongeach dimension [50]. Some common pooling methods are average poolingand max pooling where the received image is partitioned into a set of non-overlapping rectangles. Max-pooling and average pooling get only the max-imum value and average value of every sub-region respectively. This processdownsamples the image [51, 52].
After learning features in the above layer, the architecture of CNN shiftsto classification. This fully connected layer is similar to the fully connectednetwork in the conventional neural network models [32]. The final layer ofthe CNN architecture uses a classification layer such as softmax to providethe classification output [50]. The complete architecture of CNN taking an6 igure 2:
A general architecture of CNN – taken from [46]. image of number 2 is shown in Figure 2 (taken from [46]). The image goesthrough all the layers which are then classified between values 0 – 9.
3. Preprocessing Methods to Transform Numerical Data to Image
We have proposed three basic techniques of data wrangling to convertBreast Cancer numerical data to image data. The converted image must re-flect some patterns to depict a given class. We have used Wisconsin OriginalBreast Cancer (WBC) and Wisconsin Diagnostic Breast Cancer (WDBC)datasets from the UCI library [10] for the classification of numerical data inthis work.
The bar graph represents the measurement of every feature of a givendataset. There are lots of possibilities of drawing a bar graph but we haveused a simplistic approach. The dataset is first normalized to [0 ,
1] thenevery feature is drawn based on its measured value. The width of the imagein pixels is ψd + γ ( d + 1) where d is total features, ψ is the width of abar and γ is gap between two consecutive bars. The height of the image isnormalized to produce a square image. We used 1 − pixel length for ψ and2 − pixels length for γ in our experiments. This produces the square image ofsize [3 d × d ] approximately. Few data examples of WDBC dataset convertedto bar graphs are shown in Figure 3 with class labels – Benign and Malignant.The algorithm for this approach is given in the Appendix.7 igure 3: Bar graph for some data examples of WDBC dataset. These pictures are only useful to CNN if they depict a pattern in a con-volved image. The first convolutional layer produces 6 features which areshown in Figure 4 where some sort of distinguishing features have beenreflected.Intuitively, the “correct” order of the bars ought to give better results.The datasets of numerical data were reorganized where the related fields wereput close to each other according to the order of their similarity. Firstly, acovariance matrix on data fields was generated then each value of the matrixis converted to ‘rank’ that determines how closely one field is related to theother. This is a shortest-path problem where algorithms such as dynamicprogramming or any metaheuristic algorithm [53] such as Genetic Algorithm(GA) [54], Particle Swarm Optimization [55] or Reincarnation Algorithm(RA) [56] can be used to get the optimum order of bars based on theirrespective rank. Thereafter, a new set of images was created using this neworder of bars. This process has been elaborated more in Section 6.
The next method is the formation of a distance matrix which is a squaredmatrix of size [ d × d ] where d represents total features of a given example.Matrix elements are the difference between two features i.e., x ij = x i − x j where x i and x j represent the measurement of a given feature with i, j ∈ [1 , d ]. We used Euclidean distance in our experiments. The matrix is thennormalized between [0 − d × d ]which has a gain of 3 folds in length compared to bar graphs described inSection 3.1. Few data examples of WDBC dataset converted to normalized8 igure 4: Features learned by the first convolutional layer for Breast Cancerdataset.Figure 5: The normalized distance matrix for some data examples of WDBCdataset. distance matrix are shown in Figure 5 with class labels. The images can beeasily scaled up to [3 d × d ]. The first convolutional layer produces 6 featuressimilar to bar graphs is shown in Figure 6. Apparently, the above two strategies can be combined to give a thirdoption for generating an image from numerical data. We create a coloredimage of 3 layers of size [3 d × d ] where the first layer has a normalizeddistance matrix, the second layer has bar graphs, and the third layer has acopy of numerical data stored row-wise, i.e., x ij = x i where i, j ∈ [1 , d ] showsrow and column of a matrix and x i represents the measurement of a givenfeature. Few data examples of WDBC dataset converted to the combinationof options are shown in Figure 7 with the class labels.9 igure 6: Features learned by the first convolutional layer for WDBC dataset withnormalized distance matrix.Figure 7: Combined 3 layered matrix (colored image) for some data examples ofWDBC dataset. The first convolutional layer in this case, is not able to produce anydistinct feature but the scaled up image shows different colors with somebars in Figure 8. The 3 rd convolved block (12 th layer) produces some blobsscattered in the images in Figure 9.
4. Classification of Non-Image Data With CNN
As described in Section 2, CNN completes the classification process intwo steps. The first step is the auto-feature extraction of the images andthe second step is the classification of the same images with backpropagationneural networks. In the case of a numerical dataset that is not in the form ofimages, first goes through the data wrangling process described in Section 3,where either of the three options is used for non-image to image data conver-10 igure 8: Features learned by the first convolutional layer for WDBC dataset withnormalized distance matrix.Figure 9: Features learned by the 12 th convolutional layer for WDBC dataset. sion. The transformed images may not make logical sense to human eyes butCNN is capable to extract relevant features out of it. Figure 10 illustratesthe complete flowchart of the training process of CNN with non-image datasets. The process contains four important parts: Firstly, numeric input data(A) undergoes pre-processing of data wrangling (B) where it is normalizedand converted to 2D image format using one of the data wrangling techniquesdescribed in Section 3 (the figure shows distance matrix method of Section3.2). The generated image is filtered through the CNN convolution layers forfeature extraction (C). The features are trained in the fully connected layersto obtain classification outputs (D).
5. Experiments
The objective of the experiment is to provide an alternative classificationmethod with CNN for the non-image dataset of Breast Cancer and othersimilar datasets without any need for manual feature selection. We have usedWBC and WDBC datasets from the UCI library [10] for the experiments.11 . Input data(numeric format) Conversion to 2D imageB. Pre-processing
DataWrangling & normalization i j k l m Max poolingfeature maps Convolution + ReLUC. CNN feature extractionfeature maps D. ClassificationFully connected layers
Class 1Class 2
Figure 10: A complete process of non-image data classification with CNN.
The properties of these datasets are given in Table 1. We have tested theefficacy of our method with other published state-of-the-art methods used forBreast Cancer diagnosis, namely, variations of Neural Networks (NN) [57],Support Vector Machine (SVM) [58, 16, 59], Decision Tree (DT) [60] andNa¨ıve Bayes (NB) [61]. These methods are generally supported by additionalfeature selection methods such as IG, Rough set or weight NB.
Table 1: Experimented Dataset
Dataset Attributes Instances Missing Values Class Ratio(Benign:Malignant)
WDBC 32 569 0 357:212WBC 10 699 16 458:241For CNN, we used VGG16 [27] architecture with 4 convolutional blocks.Each convolutional block has 2D convolutional layer with the filter size of [3 × . × Layer × (cid:12)(cid:12)(cid:12)(cid:112) (cid:107) image (cid:107) (cid:12)(cid:12)(cid:12) filters, ReLU layer and lastly max pooling layerwith of pool size and stride of [2 × px px px able 2: Parameter setting for CNN
Parameter Value
Max iterations 1000Attempts 30Filter size 3 × η (with log transformation) 0.02Momentum 0.88L2 regularization 9.4E-7Batch Size 8identify those with the disease, and specificity is correctly identifying thosewithout the disease. Alternatively, the F1 score can be used as a derived met-ric that merges both sensitivity and precision measures. Tables 5 and 6 showthe best and average of these additional metrics respectively, for WDBC andWBC datasets on classification. We have also performed experiments us-ing CNN with 1-D convolutions on raw data without any sophisticated datatransformation. However, we have obtained poor results when compared toour method with the average classification accuracy of 76.11 and 89.64 forWDBC and WBC datasets respectively. Table 3:
Best results obtained on classification accuracy
Image Sizepx1 px2 px4Dataset TransforationType Val Test Val Test Val Test
The comparison of our methods with other state-of-the-art methods isshown in Table 7. The table shows different methods from 2009 - 2019. Theresults show accuracy, sensitivity and specificity of WBC and/or WDBCdatasets. Authors in [11] have used mammogram images of breast cancer asCNN works on images. In some cases, authors got 100% accuracy with 10-foldcross-validation for WBC dataset. Lower fold of cross-validation generally13 able 4:
Average results for classification accuracy
Image Sizepx1 px2 px4Dataset TransforationType Val Test Val Test Val Test
Table 5:
Best Score with Type3 on px1
ScoreDataset Score Type Sensitivity Specificity F1 Time (sec)
Best Sensitivity 1.00 1.00 1.00 13.3WDBC Best Specificity 1.00 1.00 1.00 9.8Best Sensitivity 1.00 0.99 0.99 15.9WBC Best Specificity 0.96 1.00 0.98 12.8
Table 6:
Average Score with Type3 on px1
Dataset Score Type Avg Score Run Time
Specificity 0.96Sensitivity 0.96WDBC F1 0.94 13.2 secSpecificity 0.97Sensitivity 0.97WBC F1 0.96 13.5 sec14 able 7: Comparison of the proposed method with other methods
Authors Year Method Accuracy Sensitivity Specificity Dataset
Akay 2009 SVM with F-score feature selection 99.51% 100 97.91 WBCChen et al. 2011 Rough set (RS) and SVM
WBCOnan 2015 Fuzzy-rough nearest neighbor 99.72% 100 99.47 WBCBhardwaj et al. 2015 Genetically Optimized NN
WBCKarabatak 2015 Na¨ıve Bayesian (NB) 98.54% 99.11 98.25 WBCWang et al. 2018 SVM based ensemble learning 97.10% 97.11 97.23 WBCNa Liu et al. 2019 IGSAGAW with CSSVM 95.80% - - WBC of this paper
CNN on mammogram images 82.43% 81.00 72.26 Mammogram
Wang et al. 2018 SVM based ensemble learning 97.68% 94.75 99.49 WDBCNa Liu et al. 2019 IGSAGAW with CSSVM 95.70% - - WDBC of this paper
100 WDBCgives lower accuracy [58, 16, 57].
6. Discussion
The experimental results of data transformation from non-image breastcancer datasets to image have been promising for the utilization of CNNfor classification accuracy. Although the proposed methods are in the earlystages, the obtained results are very significant in the development of newstrategies with data wrangling for deep learning. This also provides an op-portunity to derive even better alternatives for CNN in the future. It wasobserved that our proposed combined approach, i.e. Type-3 transformationand bar width of 1 pixel i.e. px
1, has been the most significant method as itcarries the most information about the data in three dimensions of an image.It has outperformed other methods for the WDBC dataset by clocking 100%accuracy (with 1.0 sensitivity, specificity and F1 score). It has also shownvery competitive results for the WBC dataset with 99.27% accuracy and 1.0sensitivity 0.99 specificity and 0.99 F1 score.As discussed in Section 3, different order of bar graphs for Type-1 andType-3 transformations produces different images. A bar represents its cor-responding field value of a given sample. We have tried to bring the relatedbars closer to each other by using a covariance matrix that determines the15 able 8: Order of fields based on minimization of total co-variance of adjacencymatrix
Dataset Order of Fields
WBC [5, 4, 6, 2, 3, 7, 9, 1, 10, 8]WDBC [5, 27, 14, 16, 4, 11, 2, 10, 3, 6, 1, 7, 13, 29, 20, 24, 8,21, 22, 17, 25, 26, 12, 30, 9, 18, 23, 19, 28, 15, 31]
Figure 11: Ranking of co-variance for WBC dataset in Adjacency Matrix “closeness” of two fields. For example Figure 11 shows the Adjacency Matrixof co-variance of each field for WBC dataset. The data is arranged row-wisesuch that each value represents the rank of i th row with j th column of a givenfield. To get the “best” arrangement of fields, we minimize the total co-variance rank by using a meta-heuristic algorithm GA to solve this shortestpath problem. The process of minimization for WDBC is shown in Figure12 where the minimum rank is obtained by the end of 10 th generation. Thedataset fields were reorganized where the related fields were put close to eachother according to the order of their similarity. The final order of fields forWBC and WDBC produced through minimum ranks are shown in Table 8.The images of these datasets were generated accordingly for the experiment.16 igure 12: Minimization of total covariance for a given combination of fields forWDBC dataset The only shortcoming of the CNN algorithm is its high processing costthan other methods, especially with bigger sized images. Generally, it takes9-15 seconds for a MATLAB 2018 program to complete the training pro-cess on DELL XPS i7-9700 @ 3GHz machine with 8 CPUs and NVIDIAGEFORCE RTX 2060 GPU. Despite this, the experimental results demon-strate the size of data has no direct impact on the performance of CNN.Additionally, with the advent of quantum computing [62] and parallel GPUswith enough memory can produce results in a reasonable time frame. Thedata wrangling process of converting non-image data to the image is not tooexpensive either. The every-case time complexity of the bar graph approachhas the order of O ( N d ) and the normalized distance matrix has the order of O ( N d ). The details of the algorithms are given in the Appendix.
7. Conclusion
The objective of this paper was to process non-image data (in a non-time series form) of Breast Cancer datasets WDBC and WBC into CNNdue to its state-of-the-art performance and elimination of manual featureextraction for image recognition applications. The utilization of CNN hasbeen confined largely to image data only except for some domain-specificdata conversion techniques such as NLP and voice recognition. We have17roposed some novel approaches to convert numerical non-time series datato image data. This process of conversion is very straightforward with theefficiency of the order of not more than O ( N d ). The experimental resultson classification accuracy show the competitiveness of these methods. Thereis also a high potential for improving these approaches further to have moreoutstanding results. For example, bar graphs with different shapes, sizes,color and even arrangements can be tried. Similarly, distance matrix canbe enhanced to have more information such as the mean/variance of theneighboring elements. It still needs to be seen how other applications withvarious types and orientations of numerical data would respond to CNNafter non-image data conversion to image data. Intuitively, the more theinformation on data would produce the better the results as observed withthe combined approach. Finally, the classification accuracy of numerical datawithout any sophisticated data transformation on 1-D CNN did not produceacceptable results. Appendix A. Algorithm for Equidistant Bar Graph
The pseudocode of the algorithm for equidistant bar graph is given inAlgorithm 1. Depending on the required size of the image the parameter ψ and γ can be set to define the width and the constant gap size betweentwo consecutive bars respectively. All the distances are in pixels. The lengthand width are calculated as R and C. The maximum height of the bars is H which leaves some padding distance. I is a 0 matrix of size R × C . X i isthe i th data example from the dataset X . B i is the height of bars of a dataexample X i . Appendix B. Algorithm for Normalized Distance matrix
The pseudocode of the algorithm for the normalized distance matrix isgiven in Algorithm 2. Here we use the same parameters as used in the abovesection of the algorithm for Equidistant Bar Graph. Additionally, the graphcan be expanded by [ e × e
2] with matrix E of size [ e × e
2] where eachelement is 1. Normalization of values between 0-1 is given by normalize ← γ ← R ← ψ ∗ d + γ ∗ ( d + 1); C ← ψ ∗ d + γ ∗ ( d + 1); R ← R − ψ ; I ← O R × C ; // matrix of size R × C for i = 1 : N do M i ← IB i ← (cid:98) H ∗ X i (cid:99) //bars J ← γ + 1 k ← while j ≤ C − γ do G = 0; M i [ ψ ... B i ( k ) , j ...( j + ψ − k ← k + 1 j ← j + γ + ψ if k > d then break endend Save ( M i ) end Algorithm 1: Equidistant Bar Graph19 ← . . . . . . //Expand by [ e × e I ← O R × C ; // matrix of size R × C for i = 1 : N do M i ← I for r = 1 : d dofor c = 1 : d do M i ( r, c ) = X i ( r ) − X i ( c ); M i ( r, c ) = M i ( r, c ) ∗ E ; endend M i = normalize M i );Save ( M i ) end Algorithm 2: Normalized Distance Matrix
References [1] E. Sourla, S. Sioutas, V. Syrimpeis, A. Tsakalidis, G. Tzimas, Car-diosmart365: artificial intelligence in the service of cardiologic patients,Advances in Artificial Intelligence 2012 (2012) 2.[2] F. Gao, T. Wu, J. Li, B. Zheng, L. Ruan, D. Shang, B. Patel, Sd-cnn:A shallow-deep cnn for improved breast cancer diagnosis, ComputerizedMedical Imaging and Graphics 70 (2018) 53–62.[3] M. L. Tsai, M. Knaack, P. Martone, J. Krueger, S. R. Baldinger, T. J.Lillemoe, B. Susnik, E. Grimm, S. Olet, N. Rueth, et al., Breast cancerdiagnosed in young women ≤ age 35: Effects of germline pathogenicvariants, cancer subtypes, tumor-related characteristics, and pregnancy-associated diagnosis on outcomes, Clinical Breast Cancer (2020).[4] Breast cancer - Latest research and news | Nature, ????.[5] Breast cancer | definition of breast cancer by Medical dictionary, ????.206] P. Kaur, T. B. Porras, A. Ring, J. D. Carpten, J. E. Lang, Comparison oftcga and genie genomic datasets for the detection of clinically actionablealterations in breast cancer, Scientific reports 9 (2019) 1–15.[7] M. J. Larsen, M. Thomassen, Q. Tan, K. P. Sørensen, T. A. Kruse,Microarray-based rna profiling of breast cancer: batch effect removalimproves cross-platform consistency, BioMed research international 2014(2014).[8] K. Dembrower, P. Lindholm, F. Strand, A multi-million mammogra-phy image dataset and population-based screening cohort for the train-ing and evaluation of deep neural networks—the cohort of screen-agedwomen (csaw), Journal of digital imaging (2019) 1–6.[9] K. Bowyer, D. Kopans, W. Kegelmeyer, R. Moore, M. Sallam, K. Chang,K. Woods, The digital database for screening mammography, in: Thirdinternational workshop on digital mammography, volume 58, p. 27.[10] D. Dheeru, E. Karra Taniskidou, UCI Machine Learning Repository,University of California, Irvine, School of Information and ComputerSciences, 2019.[11] W. Sun, T.-L. B. Tseng, J. Zhang, W. Qian, Enhancing deep convolu-tional neural network scheme for breast cancer diagnosis with unlabeleddata, Computerized Medical Imaging and Graphics 57 (2017) 4–9.[12] M. Firmino, G. Angelo, H. Morais, M. R. Dantas, R. Valentim,Computer-aided detection (CADe) and diagnosis (CADx) system forlung cancer with likelihood of malignancy, BioMedical Engineering On-Line 15 (2016) 2.[13] I. Guyon, A. Elisseeff, An introduction to variable and feature selection,Journal of machine learning research 3 (2003) 1157–1182.[14] V. Kumar, S. Minz, Feature selection: a literature review, SmartCR 4(2014) 211–229.[15] I. K. Fodor, A survey of dimension reduction techniques, Technical Re-port, Lawrence Livermore National Lab., CA (US), 2002.2116] N. Liu, E.-S. Qi, M. Xu, B. Gao, G.-Q. Liu, A novel intelligent clas-sification model for breast cancer diagnosis, Information Processing &Management 56 (2019) 609–623.[17] O. H. Babatunde, L. Armstrong, J. Leng, D. Diepeveen, A geneticalgorithm-based feature selection (2014).[18] B. F. Darst, K. C. Malecki, C. D. Engelman, Using recursive featureelimination in random forest to account for correlated variables in highdimensional data, BMC genetics 19 (2018) 65.[19] M. Sharma, P. Kaur, A comprehensive analysis of nature-inspired meta-heuristic techniques for feature selection problem, Archives of Compu-tational Methods in Engineering (2020) 1–25.[20] Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning aboutData, Springer Science & Business Media, 2012. Google-Books-ID:yeOoCAAAQBAJ.[21] I. Guyon, A. Elisseeff, An Introduction to Variable and Feature Selec-tion, Journal of Machine Learning Research 3 (2003) 1157–1182.[22] R. K. Singh, M. SivaBalakrishnan, Feature Selection of Gene Expres-sion Data for Cancer Classification: A review, in: 2nd InternationalSymposium on Big Data and Cloud Computing, pp. 52–57.[23] M. S. Mohamad, S. Deris, S. M. Yatim, M. R. Othman, Feature Selectionmethod using genetic algorithm for the classification of small and highdimension data, in: First International Symposium on Information andCommunication Technologies.[24] D. Kumar, D. Sharma, Deep Learning in Gene Expression Modeling, in:Handbook of Deep Learning Applications, Springer, 2019, pp. 363–383.[25] Z. Cui, W. Chen, Y. Chen, Multi-Scale Convolutional Neural Networksfor Time Series Classification, arXiv:1603.06995 [cs] (2016). ArXiv:1603.06995.[26] A. Krizhevsky, I. Sutskever, G. E. Hinton, ImageNet Classification withDeep Convolutional Neural Networks, in: F. Pereira, C. J. C. Burges,L. Bottou, K. Q. Weinberger (Eds.), Advances in Neural InformationProcessing Systems 25, Curran Associates, Inc., 2012, pp. 1097–1105.2227] K. Simonyan, A. Zisserman, Very Deep Convolutional Networks forLarge-Scale Image Recognition, arXiv:1409.1556 [cs] (2014). ArXiv:1409.1556.[28] A. Volokitin, G. Roig, T. A. Poggio, Do Deep Neural Networks Sufferfrom Crowding?, in: I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach,R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in NeuralInformation Processing Systems 30, Curran Associates, Inc., 2017, pp.5628–5638.[29] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hub-bard, L. D. Jackel, Backpropagation Applied to Handwritten Zip CodeRecognition, Neural Computation 1 (1989) 541–551.[30] C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov,D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convo-lutions, in: 2015 IEEE Conference on Computer Vision and PatternRecognition (CVPR), pp. 1–9.[31] T. Guo, J. Dong, H. Li, Y. Gao, Simple convolutional neural networkon image classification, in: 2017 IEEE 2nd International Conference onBig Data Analysis (ICBDA), pp. 721–724.[32] S. Indolia, A. K. Goswami, S. P. Mishra, P. Asopa, Conceptual Under-standing of Convolutional Neural Network- A Deep Learning Approach,Procedia Computer Science 132 (2018) 679–688.[33] W. Li, B. Victor, L. Xiao, H. Chen, Deep learning: An overview - lecturenotes, ”https://studylib.net/doc/15672646/deep-learning–an-overview-university-of-arizona-1”, 2015. [Online; accessed 10-Jan-2020].[34] N. G. Nguyen, V. A. Tran, D. L. Ngo, D. Phan, F. R. Lumbanraja, M. R.Faisal, B. Abapihi, M. Kubo, K. Satou, Dna sequence classificationby convolutional neural network, Journal of Biomedical Science andEngineering 9 (2016) 280.[35] M. Delakis, C. Garcia, text detection with convolutional neural net-works., in: VISAPP (2), pp. 290–294.[36] H. Xu, F. Su, Robust seed localization and growing with deep convo-lutional features for scene text detection, in: Proceedings of the 5th23CM on International Conference on Multimedia Retrieval, ACM, pp.387–394.[37] C. Szegedy, S. Ioffe, V. Vanhoucke, A. A. Alemi, Inception-v4, inception-ResNet and the impact of residual connections on learning, in: Pro-ceedings of the Thirty-First AAAI Conference on Artificial Intelligence,AAAI’17, AAAI Press, San Francisco, California, USA, 2017, pp. 4278–4284.[38] H. I. Fawaz, B. Lucas, G. Forestier, C. Pelletier, D. F. Schmidt, J. Weber,G. I. Webb, L. Idoumghar, P.-A. Muller, F. Petitjean, InceptionTime:Finding AlexNet for Time Series Classification, arXiv:1909.04939 [cs,stat] (2019). ArXiv: 1909.04939 version: 2.[39] J. Lines, S. Taylor, A. Bagnall, HIVE-COTE: The Hierarchical VoteCollective of Transformation-Based Ensembles for Time Series Classifi-cation, in: 2016 IEEE 16th International Conference on Data Mining(ICDM), pp. 1041–1046. ISSN: 2374-8486.[40] A. Bagnall, J. Lines, J. Hills, A. Bostrom, Time-Series Classificationwith COTE: The Collective of Transformation-Based Ensembles, IEEETransactions on Knowledge and Data Engineering 27 (2015) 2522–2535.Conference Name: IEEE Transactions on Knowledge and Data Engi-neering.[41] J. Brownlee, Deep Learning for Time Series Forecasting: Predict theFuture with MLPs, CNNs and LSTMs in Python, Machine LearningMastery, 2018. Google-Books-ID: o5qnDwAAQBAJ.[42] N. Janos, J. Roach, 1D Convolutional Neural Networks for Time SeriesModeling - Nathan Ja, 2020. Library Catalog: SlideShare.[43] M. Z. Alom, T. M. Taha, C. Yakopcic, S. Westberg, P. Sidike, M. S.Nasrin, M. Hasan, B. C. Van Essen, A. A. S. Awwal, V. K. Asari, AState-of-the-Art Survey on Deep Learning Theory and Architectures,Electronics 8 (2019) 292.[44] Z. Xiong, M. K. Stiles, J. Zhao, Robust ecg signal classification fordetection of atrial fibrillation using a novel neural network, in: 2017Computing in Cardiology (CinC), pp. 1–4.2445] S. Khan, H. Rahmani, S. A. A. Shah, M. Bennamoun, G. Medioni,S. Dickinson, A Guide to Convolutional Neural Networks for ComputerVision, Morgan & Claypool, 2018.[46] S. Saha, A Comprehensive Guide to Convolutional Neural Networks —the ELI5 way, 2018.[47] Son Lam Phung, Abdesselam Bouzerdoum, MATLAB Library for Con-volutional Neural Networks, Technical Report, Visual and Audio SignalProcessing Lab, University of Wollongong, 2009.[48] D. Stutz, Understanding Convolutional Neural Networks, Seminar Re-port, 2014.[49] M. Lichman, UCI Machine Learning Repository, University of Califor-nia, Irvine, School of Information and Computer Sciences, 2013.[50] CNN - Matlab, Convolutional Neural Network, retrievedfrom, https://au . mathworks . com/solutions/deep-learning/convolutional-neural-network . html , 2019.[51] J. Brownlee, A Gentle Introduction to Pooling Layers forConvolutional Neural Networks, retrieved from, https://machinelearningmastery . com , 2019.[52] Convolutional Neural Networks (LeNet) — DeepLearning 0.1 documen-tation - CNN - LeNet, Convolutional Neural Networks (LeNet), retrievedfrom http://deeplearning . net/tutorial/lenet . htmlhtml