Zachary Zappala
Stanford University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zachary Zappala.
Nucleic Acids Research | 2007
Oleg Kikin; Zachary Zappala; Lawrence D’Antonio; Paramjeet Singh Bagga
G-quadruplex motifs in the RNA play significant roles in key cellular processes and human disease. While sequences capable of forming G-quadruplexes in the pre-mRNA are involved in regulation of polyadenylation and splicing events in mammalian transcripts, the G-quadruplex motifs in the UTRs may help regulate mRNA expression. GRSDB2 is a second-generation database containing information on the composition and distribution of putative Quadruplex-forming G-Rich Sequences (QGRS) mapped in ∼29 000 eukaryotic pre-mRNA sequences, many of which are alternatively processed. The data stored in the GRSDB2 is based on computational analysis of NCBI Entrez Gene entries with the help of an improved version of the QGRS Mapper program. The database allows complex queries with a wide variety of parameters, including Gene Ontology terms. The data is displayed in a variety of formats with several additional computational capabilities. We have also developed a new database, GRS_UTRdb, containing information on the composition and distribution patterns of putative QGRS in the 5′- and 3′-UTRs of eukaryotic mRNA sequences. The goal of these experiments has been to build freely accessible resources for exploring the role of G-quadruplex structure in regulation of gene expression at post-transcriptional level. The databases can be accessed at the G-Quadruplex Resource Site at: http://bioinformatics.ramapo.edu/GQRS/.
Nature | 2017
Xin Li; Yungil Kim; Emily K. Tsang; Joe R. Davis; Farhan N. Damani; Colby Chiang; Gaelen T. Hess; Zachary Zappala; Benjamin J. Strober; Alexandra J. Scott; Amy Li; Andrea Ganna; Michael C. Bassik; Jason D. Merker; Ira M. Hall; Alexis Battle; Stephen B. Montgomery
Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.
Genome Research | 2016
Kimberly R. Kukurba; Princy Parsana; Brunilda Balliu; Kevin S. Smith; Zachary Zappala; David A. Knowles; Marie Julie Favé; Joe R. Davis; Xin Li; Xiaowei Zhu; James B. Potash; Myrna M. Weissman; Jianxin Shi; Anshul Kundaje; Douglas F. Levinson; Alexis Battle; Stephen B. Montgomery
The X Chromosome, with its unique mode of inheritance, contributes to differences between the sexes at a molecular level, including sex-specific gene expression and sex-specific impact of genetic variation. Improving our understanding of these differences offers to elucidate the molecular mechanisms underlying sex-specific traits and diseases. However, to date, most studies have either ignored the X Chromosome or had insufficient power to test for the sex-specific impact of genetic variation. By analyzing whole blood transcriptomes of 922 individuals, we have conducted the first large-scale, genome-wide analysis of the impact of both sex and genetic variation on patterns of gene expression, including comparison between the X Chromosome and autosomes. We identified a depletion of expression quantitative trait loci (eQTL) on the X Chromosome, especially among genes under high selective constraint. In contrast, we discovered an enrichment of sex-specific regulatory variants on the X Chromosome. To resolve the molecular mechanisms underlying such effects, we generated chromatin accessibility data through ATAC-sequencing to connect sex-specific chromatin accessibility to sex-specific patterns of expression and regulatory variation. As sex-specific regulatory variants discovered in our study can inform sex differences in heritable disease prevalence, we integrated our data with genome-wide association study data for multiple immune traits identifying several traits with significant sex biases in genetic susceptibilities. Together, our study provides genome-wide insight into how genetic variation, the X Chromosome, and sex shape human gene regulation and disease.
Genetics in Medicine | 2018
Jason D. Merker; Aaron M. Wenger; Tam P. Sneddon; Megan E. Grove; Zachary Zappala; Laure Frésard; Daryl Waggott; Sowmi Utiramerur; Yanli Hou; Kevin S. Smith; Stephen B. Montgomery; Matthew T. Wheeler; Jillian G Buchan; Christine Lambert; Kevin Eng; Luke Hickey; Jonas Korlach; James M. Ford; Euan A. Ashley
PurposeCurrent clinical genomics assays primarily utilize short-read sequencing (SRS), but SRS has limited ability to evaluate repetitive regions and structural variants. Long-read sequencing (LRS) has complementary strengths, and we aimed to determine whether LRS could offer a means to identify overlooked genetic variation in patients undiagnosed by SRS.MethodsWe performed low-coverage genome LRS to identify structural variants in a patient who presented with multiple neoplasia and cardiac myxomata, in whom the results of targeted clinical testing and genome SRS were negative.ResultsThis LRS approach yielded 6,971 deletions and 6,821 insertions > 50 bp. Filtering for variants that are absent in an unrelated control and overlap a disease gene coding exon identified three deletions and three insertions. One of these, a heterozygous 2,184 bp deletion, overlaps the first coding exon of PRKAR1A, which is implicated in autosomal dominant Carney complex. RNA sequencing demonstrated decreased PRKAR1A expression. The deletion was classified as pathogenic based on guidelines for interpretation of sequence variants.ConclusionThis first successful application of genome LRS to identify a pathogenic variant in a patient suggests that LRS has significant potential for the identification of disease-causing structural variation. Larger studies will ultimately be required to evaluate the potential clinical utility of LRS.
bioRxiv | 2016
François Aguet; Andrew Anand Brown; Stephane E. Castel; Joe R. Davis; Pejman Mohammadi; Ayellet V. Segrè; Zachary Zappala; Nathan S. Abell; Laure Frésard; Eric R. Gamazon; Ellen T. Gelfand; Machael J Gloudemans; Yuan He; Farhad Hormozdiari; Xiao Li; Xin Li; Boxiang Liu; Diego Garrido-Martín; Halit Ongen; John Palowitch; YoSon Park; Christine B. Peterson; Gerald Quon; Stephan Ripke; Andrey A. Shabalin; Tyler C. Shimko; Benjamin J. Strober; Timothy J. Sullivan; Nicole A. Teran; Emily K. Tsang
Expression quantitative trait locus (eQTL) mapping provides a powerful means to identify functional variants influencing gene expression and disease pathogenesis. We report the identification of cis-eQTLs from 7,051 post-mortem samples representing 44 tissues and 449 individuals as part of the Genotype-Tissue Expression (GTEx) project. We find a cis-eQTL for 88% of all annotated protein-coding genes, with one-third having multiple independent effects. We identify numerous tissue-specific cis-eQTLs, highlighting the unique functional impact of regulatory variation in diverse tissues. By integrating large-scale functional genomics data and state-of-the-art fine-mapping algorithms, we identify multiple features predictive of tissue-specific and shared regulatory effects. We improve estimates of cis-eQTL sharing and effect sizes using allele specific expression across tissues. Finally, we demonstrate the utility of this large compendium of cis-eQTLs for understanding the tissue-specific etiology of complex traits, including coronary artery disease. The GTEx project provides an exceptional resource that has improved our understanding of gene regulation across tissues and the role of regulatory variation in human genetic diseases.
pacific symposium on biocomputing | 2013
Roxana Daneshjou; Zachary Zappala; Kimberly R. Kukurba; Sean M. Boyle; Kelly E. Ormond; Teri E. Klein; Michael Snyder; Carlos Bustamante; Russ B. Altman; Stephen B. Montgomery
The American College of Medical Genetics and Genomics (ACMG) recently released guidelines regarding the reporting of incidental findings in sequencing data. Given the availability of Direct to Consumer (DTC) genetic testing and the falling cost of whole exome and genome sequencing, individuals will increasingly have the opportunity to analyze their own genomic data. We have developed a web-based tool, PATH-SCAN, which annotates individual genomes and exomes for ClinVar designated pathogenic variants found within the genes from the ACMG guidelines. Because mutations in these genes predispose individuals to conditions with actionable outcomes, our tool will allow individuals or researchers to identify potential risk variants in order to consult physicians or genetic counselors for further evaluation. Moreover, our tool allows individuals to anonymously submit their pathogenic burden, so that we can crowd source the collection of quantitative information regarding the frequency of these variants. We tested our tool on 1092 publicly available genomes from the 1000 Genomes project, 163 genomes from the Personal Genome Project, and 15 genomes from a clinical genome sequencing research project. Excluding the most commonly seen variant in 1000 Genomes, about 20% of all genomes analyzed had a ClinVar designated pathogenic variant that required further evaluation.
Human Mutation | 2017
Kristin D. Kernohan; Laure Frésard; Zachary Zappala; Taila Hartley; Kevin S. Smith; Justin D. Wagner; Hongbin Xu; Arran McBride; Pierre R. Bourque; Steffany A. L. Bennett; David A. Dyment; Kym M. Boycott; Stephen B. Montgomery; Jodi Warman Chardon
At least 15% of the disease‐causing mutations affect mRNA splicing. Many splicing mutations are missed in a clinical setting due to limitations of in silico prediction algorithms or their location in noncoding regions. Whole‐transcriptome sequencing is a promising new tool to identify these mutations; however, it will be a challenge to obtain disease‐relevant tissue for RNA. Here, we describe an individual with a sporadic atypical spinal muscular atrophy, in whom clinical DNA sequencing reported one pathogenic ASAH1 mutation (c.458A>G;p.Tyr153Cys). Transcriptome sequencing on patient leukocytes identified a highly significant and atypical ASAH1 isoform not explained by c.458A>G(p<10−16). Subsequent Sanger‐sequencing identified the splice mutation responsible for the isoform (c.504A>C;p.Lys168Asn) and provided a molecular diagnosis of autosomal‐recessive spinal muscular atrophy with progressive myoclonic epilepsy. Our findings demonstrate the utility of RNA sequencing from blood to identify splice‐impacting disease mutations for nonhematological conditions, providing a diagnosis for these otherwise unsolved patients.
Human Heredity | 2016
Zachary Zappala; Stephen B. Montgomery
Whole-genome and exome sequencing in human populations has revealed the tolerance of each gene for loss-of-function variation. By understanding this tolerance, it has become increasingly possible to identify genes that would make safe therapeutic targets and to identify rare genetic risk factors and phenotypes at the scale of individual genomes. To date, the vast majority of surveyed loss-of-function variants are in protein-coding regions of the genome mainly due to the focus on these regions by exome-based sequencing projects and their relative ease of interpretability. As whole-genome sequencing becomes more prevalent, new strategies will be required to uncover impactful variation in non-coding regions of the genome where the architecture of genome function is more complex. In this review, we investigate recent studies of loss-of-function variation and emerging approaches for interpreting whole-genome sequencing data to identify rare and impactful non-coding loss-of-function variants.
Nature Genetics | 2017
Mauro Pala; Zachary Zappala; Mara Marongiu; Xin Li; Joe R. Davis; Roberto Cusano; Francesca Crobu; Kimberly R. Kukurba; Michael J. Gloudemans; Frederic Reinier; Riccardo Berutti; Maria Grazia Piras; Antonella Mulas; Magdalena Zoledziewska; Michele Marongiu; Elena P. Sorokin; Gaelen T. Hess; Kevin S. Smith; Fabio Busonero; Andrea Maschio; Maristella Steri; Carlo Sidore; Serena Sanna; Edoardo Fiorillo; Michael C. Bassik; Stephen Sawcer; Alexis Battle; John Novembre; Chris Jones; Andrea Angius
Genetic studies of complex traits have mainly identified associations with noncoding variants. To further determine the contribution of regulatory variation, we combined whole-genome and transcriptome data for 624 individuals from Sardinia to identify common and rare variants that influence gene expression and splicing. We identified 21,183 expression quantitative trait loci (eQTLs) and 6,768 splicing quantitative trait loci (sQTLs), including 619 new QTLs. We identified high-frequency QTLs and found evidence of selection near genes involved in malarial resistance and increased multiple sclerosis risk, reflecting the epidemiological history of Sardinia. Using family relationships, we identified 809 segregating expression outliers (median z score of 2.97), averaging 13.3 genes per individual. Outlier genes were enriched for proximal rare variants, providing a new approach to study large-effect regulatory variants and their relevance to traits. Our results provide insight into the effects of regulatory variants and their relationship to population history and individual genetic risk.
bioRxiv | 2016
Mauro Pala; Zachary Zappala; Mara Marongiu; Xin Li; Joe R. Davis; Roberto Cusano; Francesca Crobu; Kimberly R. Kukurba; Frederic Reiner; Riccardo Berutti; Maria Grazia Piras; Antonella Mulas; Magdalena Zoledziewska; Michele Marongiu; Fabio Busonero; Andrea Maschio; Maristella Steri; Carlo Sidore; Serena Sanna; Edoardo Fiorillo; Alexis Battle; John Novembre; Chris Jones; Andrea Angius; Gonçalo R. Abecasis; David Schlessinger; Francesco Cucca; Stephen B. Montgomery
Identifying functional non-coding variants can enhance genome interpretation and inform novel genetic risk factors. We used whole genomes and peripheral white blood cell transcriptomes from 624 Sardinian individuals to identify non-coding variants that contribute to population, family, and individual differences in transcript abundance. We identified 21,183 independent expression quantitative trait loci (eQTLs) and 6,768 independent splicing quantitative trait loci (sQTLs) influencing 73 and 41% of all tested genes. When we compared Sardinian eQTLs to those previously identified in Europe, we identified differentiated eQTLs at genes involved in malarial resistance and multiple sclerosis, reflecting the long-term epidemiological history of the island’s population. Taking advantage of pedigree data for the population sample, we identify segregating patterns of outlier gene expression and allelic imbalance in 61 Sardinian trios. We identified 809 expression outliers (median z-score of 2.97) averaging 13.3 genes with outlier expression per individual. We then connected these outlier expression events to rare non-coding variants. Our results provide new insight into the effects of non-coding variants and their relationship to population history, traits and individual genetic risk.