Max Shpak
University of Texas at Austin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Max Shpak.
Genomics | 2014
Max Shpak; Amelia W. Hall; Marcus M. Goldberg; Dakota Z. Derryberry; Yunyun Ni; Vishwanath R. Iyer; Matthew C. Cowperthwaite
In this paper we use eQTL mapping to identify associations between gene dysregulation and single nucleotide polymorphism (SNP) genotypes in glioblastoma multiforme (GBM). A set of 532,954 SNPs was evaluated as predictors of the expression levels of 22,279 expression probes. We identified SNPs associated with fold change in expression level rather than raw expression levels in the tumor. Following adjustment for false discovery rate, the complete set of probes yielded 9257 significant associations (p<0.05). We found 18 eQTLs that were missense mutations. Many of the eQTLs in the non-coding regions of a gene, or linked to nearby genes, had large numbers of significant associations (e.g. 321 for RNASE3, 101 for BNC2). Functional enrichment analysis revealed that the expression probes in significant associations were involved in signal transduction, transcription regulation, membrane function, and cell cycle regulation. These results suggest several loci that may serve as hubs in gene regulatory pathways associated with GBM.
PLOS ONE | 2016
Jie Lu; Matthew C. Cowperthwaite; Mark G. Burnett; Max Shpak
Glioblastoma multiforme (GBM) is the most common and aggressive adult primary brain cancer, with <10% of patients surviving for more than 3 years. Demographic and clinical factors (e.g. age) and individual molecular biomarkers have been associated with prolonged survival in GBM patients. However, comprehensive systems-level analyses of molecular profiles associated with long-term survival (LTS) in GBM patients are still lacking. We present an integrative study of molecular data and clinical variables in these long-term survivors (LTSs, patients surviving >3 years) to identify biomarkers associated with prolonged survival, and to assess the possible similarity of molecular characteristics between LGG and LTS GBM. We analyzed the relationship between multivariable molecular data and LTS in GBM patients from the Cancer Genome Atlas (TCGA), including germline and somatic point mutation, gene expression, DNA methylation, copy number variation (CNV) and microRNA (miRNA) expression using logistic regression models. The molecular relationship between GBM LTS and LGG tumors was examined through cluster analysis. We identified 13, 94, 43, 29, and 1 significant predictors of LTS using Lasso logistic regression from the somatic point mutation, gene expression, DNA methylation, CNV, and miRNA expression data sets, respectively. Individually, DNA methylation provided the best prediction performance (AUC = 0.84). Combining multiple classes of molecular data into joint regression models did not improve prediction accuracy, but did identify additional genes that were not significantly predictive in individual models. PCA and clustering analyses showed that GBM LTS typically had gene expression profiles similar to non-LTS GBM. Furthermore, cluster analysis did not identify a close affinity between LTS GBM and LGG, nor did we find a significant association between LTS and secondary GBM. The absence of unique LTS profiles and the lack of similarity between LTS GBM and LGG, indicates that there are multiple genetic and epigenetic pathways to LTS in GBM patients.
Genomics | 2015
Max Shpak; Marcus M. Goldberg; Matthew C. Cowperthwaite
Determining which mutations drive tumor progression is a defining question in cancer genomics. We analyzed sequence evolution in Glioblastoma multiforme (GBM) by computing the number of parallel mutations and by estimating ω=dN/dS, a measure of the strength and direction of selection. The ω values of almost all 7617 mutated genes in GBM are much higher than in germline genes. We identified only 21 genes under significant positive selection in GBM, as well as 29 genes under significant purifying selection, including several zinc finger proteins. Therefore, most of the high ω values in the GBM genome are due to weaker purifying selection rather than positive selection. We also found multiple recurrent mutations in GBM, several of which are associated with patient survival time. Our results suggest that convergence and neutral evolution play a significant role in GBM, and that sites with recurrent mutations can serve as molecular diagnostics of the clinical course of GBM tumors.
Cancer Research | 2018
Amelia W. Hall; Anna Battenhouse; Haridha Shivram; Adam R. Morris; Matthew C. Cowperthwaite; Max Shpak; Vishwanath R. Iyer
Glioblastoma multiforme (GBM) can be clustered by gene expression into four main subtypes associated with prognosis and survival, but enhancers and other gene-regulatory elements have not yet been identified in primary tumors. Here, we profiled six histone modifications and CTCF binding as well as gene expression in primary gliomas and identified chromatin states that define distinct regulatory elements across the tumor genome. Enhancers in mesenchymal and classical tumor subtypes drove gene expression associated with cell migration and invasion, whereas enhancers in proneural tumors controlled genes associated with a less aggressive phenotype in GBM. We identified bivalent domains marked by activating and repressive chromatin modifications. Interestingly, the gene interaction network from common (subtype-independent) bivalent domains was highly enriched for homeobox genes and transcription factors and dominated by SHH and Wnt signaling pathways. This subtype-independent signature of early neural development may be indicative of poised dedifferentiation capacity in glioblastoma and could provide potential targets for therapy.Significance: Enhancers and bivalent domains in glioblastoma are regulated in a subtype-specific manner that resembles gene regulation in glioma stem cells. Cancer Res; 78(10); 2463-74. ©2018 AACR.
Stroke | 2018
Max Shpak; Zoltan Nadasdy; Matthew C. Cowperthwaite; Anurekha Ramakrishnan; Kevin Orndorff; Christopher Fanale
Introduction: Atrial fibrillation (afib) patients are often prescribed anticoagulants to reduce stroke risk. Recently, NOACs have begun to supplant warfarin because of faster onset/offset of effect...
bioRxiv | 2017
Max Shpak; Yang Ni; Jie Lu; Peter R. Mueller
The mean pairwise genetic distance among haplotypes is an estimator of the population mutation rate θ and a standard measure of variation in a population. With the advent of next-generation sequencing (NGS) methods, this and other population parameters can be estimated under different modes of sampling. One approach is to sequence individual genomes with high coverage, and to calculate genetic distance over all sample pairs. The second approach, typically used for microbial samples or for tumor cells, is sequencing a large number of pooled genomes with very low individual coverage. With low coverage, pairwise genetic distances are calculated across independently sampled sites rather than across individual genomes. In this study, we show that the variance in genetic distance estimates is reduced with low coverage sampling if the mean pairwise linkage disequilibrium weighted by allele frequencies is positive. Practically, this means that if on average the most frequent alleles over pairs of loci are in positive linkage disequilibrium, low coverage sequencing results in improved estimates of θ, assuming similar per-site read depths. We show that this result holds under the expected distribution of allele frequencies and linkage disequilibria for an infinite sites model at mutation-drift equilibrium. From simulations, we find that the conditions for reduced variance only fail to hold in cases where variant alleles are few and at very low frequency. These results are applied to haplotype frequencies from a lung cancer tumor to compute the weighted linkage disequilibria and the expected error in estimated genetic distance using high versus low coverage.Genetic distance is a standard measure of variation in populations. When sequencing genomes individually, genetic distances are computed over all pairs of multilocus haplotypes in a sample. However, when next-generation sequencing methods obtain reads from heterogeneous assemblages of genomes (e.g. for microbial samples in a biofilm or cells from a tumor), individual reads are often drawn from different genomes. This means that pairwise genetic distances are calculated across independently sampled sites rather than across haplotype pairs. In this paper, we show that while the expected pairwise distance under whole haplotype sampling (WHS) is the same as with independent locus sampling (ILS), the sample variances of pairwise distance differ and depend on the direction and magnitude of linkage disequilibrium (LD) among polymorphic sites. We derive a weighted LD value that, when positive, predicts higher sample variance in estimated genetic distance for WHS. Weighted LD is positive when on average, the most common alleles at two loci are in positive LD. Using individual-based simulations of an infinite sites model under Fisher-Wright genetic drift, variances of estimated genetic distance are found to be almost always higher under WHS than under ILS, suggesting a reduction in estimation error when sites are sampled independently. We apply these results to haplotype frequencies from a lung cancer tumor to compute weighted LD and the variances in estimated genetic distance under ILS vs. WHS, and find that the the relative magnitudes of variances under WHS vs. ILS are sensitive to sampled allele frequencies.
bioRxiv | 2017
Max Shpak; Jie Lu; Jeffrey P. Townsend
Among many organisms, offspring are constrained to occur at sites adjacent to their parents. This applies to plants and animals with limited dispersal ability, to colonies of microbes in biofilms, and to other genetically heterogeneous aggregates of cells, such as cancerous tumors. The spatial structure of such populations leads to greater relatedness among proximate individuals while increasing the genetic divergence between distant individuals. In this study, we analyze a Moran coa-lescent in a one-dimensional spatial model where a randomly selected individual dies and is replaced by the progeny of an adjacent neighbor in every generation. We derive a recursive system of equations using the spatial distance among haplotypes as a state variable to compute coalescent probabilities and coalescent times. The coalescent probabilities near the branch termini are smaller than in the unstructured Moran model (except for t = 1, where they are equal), corresponding to longer branch lengths and greater expected pairwise coalescent times. The lower terminal coalescent probabilities result from a spatial separation of lineages, i.e. a coalescent event between a haplotype and its neighbor in one spatial direction at time t cannot co-occur with a coalescent event with a haplotype in the opposite direction at t + 1. The concomitant increased pairwise genetic distance among randomly sampled haplotypes in spatially constrained populations could lead to incorrect inferences of recent diversifying selection or of population bottlenecks when analyzed using an unconstrained coalescent model as a null hypothesis.
Theoretical Population Biology | 2017
Max Shpak; Yang Ni; Jie Lu; Peter Müller
The mean pairwise genetic distance among haplotypes is an estimator of the population mutation rate θ and a standard measure of variation in a population. With the advent of next-generation sequencing (NGS) methods, this and other population parameters can be estimated under different modes of sampling. One approach is to sequence individual genomes with high coverage, and to calculate genetic distance over all sample pairs. The second approach, typically used for microbial samples or for tumor cells, is sequencing a large number of pooled genomes with very low individual coverage. With low coverage, pairwise genetic distances are calculated across independently sampled sites rather than across individual genomes. In this study, we show that the variance in genetic distance estimates is reduced with low coverage sampling if the mean pairwise linkage disequilibrium weighted by allele frequencies is positive. Practically, this means that if on average the most frequent alleles over pairs of loci are in positive linkage disequilibrium, low coverage sequencing results in improved estimates of θ, assuming similar per-site read depths. We show that this result holds under the expected distribution of allele frequencies and linkage disequilibria for an infinite sites model at mutation-drift equilibrium. From simulations, we find that the conditions for reduced variance only fail to hold in cases where variant alleles are few and at very low frequency. These results are applied to haplotype frequencies from a lung cancer tumor to compute the weighted linkage disequilibria and the expected error in estimated genetic distance using high versus low coverage.
Journal of Molecular Evolution | 2014
Max Shpak; Luciana Girotto Gentil; Manuel Miranda
Abstract In the vertebrate central nervous system, glycinergic neurotransmission is regulated by the action of the glycine transporters 1 and 2 (GlyT1 and GlyT2)—members of the solute carrier family 6 (SLC6). Several invertebrate deuterostomes have two paralogous glycine carrier genes, with one gene in the pair having greater sequence identity and higher alignment scores with respect to GlyT1 and the other paralog showing greater similarity to GlyT2. In phylogenetic trees, GlyT2-like sequences from invertebrate deuterostomes form a monophyletic subclade with vertebrate GlyT2, while invertebrate GlyT1-like proteins constitute an outgroup to both the GlyT2-like proteins and to vertebrate GlyT1 sequences. These results are consistent with the hypothesis that vertebrate GlyT1 and GlyT2 are, respectively, derived from GlyT1- and GlyT2-like genes in invertebrate deuterostomes. This implies that the gene duplication which gave rise to these paralogs occurred prior to the origin of vertebrates. GlyT2 subsequently diverged significantly from its invertebrate orthologs (i.e., through the acquisition of a unique N-terminus) as a consequence of functional specialization, being expressed principally in the lower CNS; while GlyT1 has activity in both the lower CNS and several regions of the forebrain.
Cancer Research | 2015
Woo Suk Hong; Max Shpak; Jeffrey P. Townsend