In the field of cancer research, the heterogeneity of tumor samples poses many challenges to data analysis. With the rapid development of genomics, researchers have begun to adopt new statistical methods to gain a deeper understanding of the composition of tumor samples and improve the accuracy of predictions. Among them, the statistical method DeMix came into being and became an important tool for cancer transcriptome deconvolution.
DeMix is a statistical method that predicts the likely ratio of tumor to stromal cells in a sample and uses a linear mixed model to combat data heterogeneity in tumor samples.
Solid tumor samples are often derived from clinical practice and are composed of multiple clonal tumor cell populations, as well as adjacent normal tissue, stroma, and infiltrating immune cells. This complex structure makes many genomics data analyses difficult and fraught with bias. Therefore, prior to analysis, it is extremely important to accurately estimate the purity of the tumor, that is, the percentage of cancer cells in the tumor sample.
Because cancer cells differ significantly from normal cells, researchers can estimate the purity of a tumor using high-throughput genomic or epigenomic data. DeMix is a method that aims to extract the ratio of cancer cell gene expression and its expression profile from mixed samples.
DeMix assumes that the mixed sample consists of only two cell types: cancer cells (with no known gene expression profile) and normal cells (with known gene expression data).
The core operation of DeMix is based on linear mixed models, which capture the gene expression of two types of cells in a mixed sample by modeling the data. The innovation of this method is that it analyzes heterogeneous data before the data are log-transformed, which can more effectively predict gene expression and proportions in cancer cells.
Specifically, the workflow of DeMix can be divided into two main steps: the first step is to maximize the likelihood function based on the observed data to solve the unknown tumor proportions and gene expression parameters of normal cells; the second step is to maximize the likelihood function based on the observed data to solve the unknown tumor proportions and gene expression parameters of normal cells; The next step is to estimate the expression level of a pair of normal cells and tumor cells for each sample and gene based on the tumor ratio obtained in the first step.
The development of this method is based on the Nelder-Mead optimization procedure, which allows the numerical integration of the joint density to achieve the required computational accuracy.
The DeMix method is adjusted for a variety of data scenarios, whether matched samples with reference genes or unmatched samples without reference genes. This flexibility enables DeMix to play its role in a variety of research environments.
Although at least one reference gene is required in all cases, five to ten genes are recommended to reduce the impact of outliers and identify an optimal set of tumor proportion estimates.DeMix assumes that the mixed sample contains at most two cellular components: normal cells and tumor cells, and relies on available data to estimate the distribution parameters of normal cells.
As data analysis technology continues to advance, methods such as DeMix will be more widely used in cancer research. This will not only help researchers unravel the secrets behind tumor heterogeneity, but may also give rise to new treatment options, bringing greater hope to cancer patients.
In such a data-driven era, we need to think about: In future cancer research, how can we better use biological reference genes to improve the effectiveness and accuracy of clinical treatment?