NPJ Genomic Medicine | 2019
Largescale population genomics versus deep phenotyping: Brute force or elegant pragmatism towards precision medicine
Abstract
Biomedical research has been accelerating at an unprecedented pace, with evidence racing towards advancing precision medicine initiatives worldwide. Genomics is at the center stage of these efforts, with extensive genetic and genomic data being continuously collected, analysed, and archived. Indeed, over the past decade, we have deepened our understanding of the underlying genetic etiologies and biologic mechanisms of both, rare Mendelian and common complex human diseases. One critical issue, rightly identified by many, is the value and translational utility of the massive researchassociated -omics data, particularly as related to identifying robust genotype-phenotype associations that could impact patient care and outcomes. In other words, have we blinded ourselves with big conglomerated data that we cannot see the trees for the forest? Large-scale population-based studies and associated consortia have played a pivotal role in laying the genomic framework of various diseased and healthy populations. Resultant data include both common and rare variants associated with different phenotypic states. The genome-wide association studies (GWAS) approach has emerged as a powerful tool for identifying genomic loci for various common human diseases and traits. Since its inception in 2008, the GWAS Catalog now includes >100,000 SNPtrait associations. And while there is great success in mapping putative common risk alleles, more research is required to pinpoint the genes involved. Relatedly, the surge in nextgeneration sequencing capabilities combined with a decline in sequencing cost and optimized computational infrastructure have made it practical to sequence humans at the population level. Such efforts have also led to the establishment of population level, publicly accessible databases to facilitate data sharing and discovery. A recent example has been presented through the Exome Aggregation Consortium (ExAC), followed by The Genome Aggregation Database (gnomAD), the largest public catalogue of 141,456 individuals sequenced as part of various disease-related or population genetic studies. Such large-scale genomic datasets of diverse human populations indeed form a critical framework for the functional interpretation of genetic variations, both in the research and clinical settings. Although such a gestalt approach provides a powerful tool to hone in on disease-causing variations, including ultra-rare ones, the utility of this and other population databases is naturally context-dependent. As such, germline disease-associated variants in TP53 have been found to be enriched in ExAC and gnomAD populations. Other pathogenic variants were found in known hereditary cancer predisposition genes such as PTEN, BRCA1, BRCA2, APC and MLH1. Based on overall allele number in the interrogated populations, these rare disease-causing variants would still hypothetically represent a lower overall burden compared to a purely diseased population. However, the counterargument is the fact that such population databases do include individuals with (e.g. TCGA) or projected to have cancer, and these individuals may indeed be undiagnosed cases harboring bona fide, yet unsuspected, germline high penetrance mutations. Understandably, the power harnessed from an ever-increasing sample size is countered by an inability to obtain individual-level genotypic or phenotypic data to tease out such associations. This is particularly important in the context of more common phenotypes such as cancer and heart disease, although efforts have been made to stratify population genetic data by global phenotypic traits (e.g. control, absence of cancer, absence of neurological disorders, etc.). Another pertinent challenge is the lack of universal standardization of variant interpretation, with data pointing towards high variability between computational algorithms and an inherent bias towards well-studied genetic diseases – hence, dependence on phenotype. Ironically, our efforts to analyze big data for personalizing medicine may have resulted in the opposite, ie, generalizations associated with populations and groups. History has shown that great clinical and scientific lessons can be learned from rare disorders. For example, germline PTEN mutations cause a subset of Cowden syndrome, but each component cancer belonging to this syndrome can be common in the general population or other differential diagnoses. Importantly, somatic PTEN mutations are one of the most frequent mutations across many sporadic malignancies. The discovery of PTEN as the Cowden susceptibility gene emanated from the interrogation of a focused set of five meticulously-phenotyped families having individuals with full-blown disease. Therefore, for rare Mendelian disorders, it is only pragmatic to focus on deeply-phenotyped individuals to obtain the critical data that enables the practice of evidence-based, precision healthcare. Other studies have further emphasized the importance of “smart” experimental design, starting from a well-selected group of patients perfectly matched to controls to derive clinically-relevant conclusions. It is this “smart” experimental design coupled with well-annotated phenotypes that has led to identifying PRDM1 in the etiology of therapy-induced second malignancies after Hodgkin’s lymphoma. Though “smart” experimental design in the setting of deep phenotyping seems common sensical, the recent popular opinion is that power is always in the numbers. Deep phenotyping not only encompasses objectively documenting disease manifestations, but also focuses on integrating these data for a more organismal view. Appropriately, the “human phenomic science” approach of integrating human phenotypic data with physiologic, multi-omic, and imaging data has emerged as a blueprint for precision medicine. Indeed, the notion to deeply phenotype a finite set of individuals with a particular phenotype lies