Computers in biology and medicine | 2021

Application of decision tree-based ensemble learning in the classification of breast cancer

 
 

Abstract


As a common screening and diagnostic tool, Fine Needle Aspiration Biopsy (FNAB) of the suspicious breast lumps can be used to distinguish between malignant and benign breast cytology. In this study, we first review published works on the classification of breast cancer where the machine learning and data mining algorithms have been applied by using the Wisconsin Breast Cancer Database (WBCD). This work then introduces useful new tools, based on Random Forest (RF) and Extremely Randomized Trees or Extra Trees (ET) algorithms to classify breast cancer. The RF and ET strategies use the decision trees as proper classifiers to attain the ultimate classification. The RF and ET approaches include four main stages: input identification, determination of the optimal number of trees, voting analysis, and final decision. The models implemented in this research consider important factors such as uniformity of cell size, bland chromatin, mitoses, and clump thickness as the input parameters. According to the statistical analysis, the proposed methods are able to classify the type of breast cancer accurately. The error analysis results reveal that the designed RF and ET models offer easy-to-use outcomes and the highest diagnostic performance, compared to previous tools/models in the literature for the WBCD classification. The highest and lowest magnitudes of relative importance are attributed to the uniformity of cell size and mitoses among the factors. It is expected that the RF and ET algorithms play an important role in medicine and health systems for screening and diagnosis in the near future.

Volume 128
Pages \n 104089\n
DOI 10.1016/j.compbiomed.2020.104089
Language English
Journal Computers in biology and medicine

Full Text