In the field of cancer research, understanding tumor heterogeneity is crucial to accurately predict treatment response and outcome. Accurate prediction of the composition ratio of cancer cells and normal cells can greatly improve the targetedness and effectiveness of diagnosis and treatment. Recently, the DeMix method developed by Ahn et al. provides an innovative solution to this challenge. This statistical method can unmix mixed cancer transcriptomes to predict the likely proportions of tumor and stromal cells in a sample.
Solid tumor samples are often composed of multiple clonal cancer cell populations, adjacent normal tissue, stroma, and infiltrating immune cells, which are highly heterogeneous.
The highly heterogeneous structure of tumors often causes trouble for various genomic data analyses and may even introduce biases. Therefore, removing heterogeneity from mixed samples and accounting for tumor purity, i.e., the percentage of cancer cells in a tumor sample, in calculations is an important task. This task is particularly dependent on high-throughput genomic or epigenomic data, as the striking differences between cancer cells and normal cells make it possible to estimate tumor purity.
The DeMix method provides a new strategy for clinical transcriptomics by analyzing the proportion and gene expression characteristics of cancer cells in mixed samples.
It is worth noting that the DeMix method considers four possible scenarios, including: matched tumor and normal samples (with and without reference genes), and unmatched tumor and normal samples (also with and without reference genes). In these scenarios, reference genes have expression profiles that are accurately estimated based on external data covering all constituent tissue types.
DeMix assumes that the mixed sample consists of only two types of cells: cancer cells (with unknown gene expression profiles) and normal cells (with known gene expression profiles, which can come from matched or unmatched samples). This method shows its importance when performing microarray data analysis, especially using raw data as input rather than log-transformed data as other methods do.
Specifically, DeMix first uses maximum likelihood estimation to predict gene expression and proportions of tumor cells. Then, on this basis, normal and tumor expression levels were estimated for each sample and gene.
This method analyzes data from heterogeneous tumor samples and estimates gene expression levels before the data are log-transformed. This innovation greatly improves the accuracy of predictions.
The DeMix method is very flexible and can cover four data scenarios: with or without reference genes, and with or without matched samples. Although the algorithm requires at least one gene as a reference gene, it is recommended to use at least 5 to 10 genes to mitigate the potential impact of outliers and identify the optimal combination ratio.
In practical applications, especially when processing high-throughput data, the advantages of DeMix are more obvious. Although a joint model can estimate all parameters simultaneously, its computational complexity may make it unsuitable for processing large-scale datasets.
By adaptively using DeMix in different contexts, clinical researchers can more accurately analyze and interpret the biology of cancer samples.
Overall, DeMix provides an efficient computational approach to overcome the challenges posed by tumor heterogeneity. This method not only improves our understanding of the composition of cancer cells and normal cells, but also provides new perspectives for future cancer research and treatment. With the advancement of technology, how to further improve the application accuracy of DeMix and make it suitable for more complex tumor microenvironments will be an important topic in future tumor biology research. What new developments do you think this research will bring?