IEEE Access | 2021

An Effective Tumor Classification With Deep Forest and Self-Training

 
 
 

Abstract


In recent years, tumor classification based on the gene expression omnibus has become a continuous attention field in the area of bioinformatics. Integration machine learning techniques are an efficient methods to solve these problems. Generally, in order to obtain good performance in the supervised learning tasks, a large number of labelled samples will be required. However, in many cases, only a few labelled samples and abundant unlabelled samples exist in the training database. The process of labelling these unlabelled samples manually is difficult and expensive. Therefore, semi-supervised learning approaches have been proposed to utilize unlabelled samples to improve the performance of a model. However, noisy samples decrease the robustness of model in semi-supervised learning. We wish training style that samples can be implemented to train by from high- to low-confidence, self-training can meet this requirement, and the deep forest approach with the hyper-parameter settings used in this work can obtain good accuracy. Therefore, in this paper, we present a novel semi-supervised learning approach with a deep forest model to increase the performance of tumor classification, which employs unlabelled samples and minimizes the cost; that is, a updated unlabelled sample mechanism is investigated to expand the number of high-confidence pseudo-labelled samples. Multiple real-world experiments indicate that our proposed approach can obtain results up 0.96 accuracy and F1-Score, and 0.9798 AUCs.

Volume 9
Pages 100944-100950
DOI 10.1109/ACCESS.2021.3096241
Language English
Journal IEEE Access

Full Text