Archive | 2019
On the Role of Hub and Orphan Genes in the Diagnosis of Breast Invasive Carcinoma
Abstract
Network information is gaining importance in the generation of predictive models in cancer genomics, with the premise that prior biological knowledge offers the models interpretability and reproducibility, an invaluable contribution in precision medicine. This work evaluates the usefulness of accounting for gene network information provided by the data correlation structure and external STRING information in the classification of Breast Invasive Carcinoma (BRCA) RNA-Seq data from The Cancer Genome Atlas (TCGA) into tumor and normal tissue, by sparse logistic regression (SLR). Within the correlation-based approaches, two directions were investigated: first, imposing smaller penalties on hub genes, i.e., highly connected genes in the network (hubSLR); second, favouring the selection of orphan or weakly correlated genes (orphanSLR). Without loss of predictive ability, a considerable overlap between the genes selected by the methods was achieved, with fewer genes exclusively selected by each method. Besides a consensus list of genes, the complementarity offered by sets of genes exclusively selected by each model based on different network information shall be regarded as a means to enhance biological interpretability, drawing attention to genes with a known role in the network, either hubs, orphans or highly connected genes in protein-protein interaction networks. This represents a major advantage over non network-based methods, enabling the disclosure of the relevance of known gene subnetworks in the disease under study, while boosting biomarker discovery and precision medicine.