Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Zhan Ye is active.

Publication


Featured researches published by Zhan Ye.


Genes and Immunity | 2013

A PheWAS approach in studying HLA-DRB1*1501

Scott J. Hebbring; Steven J. Schrodi; Zhan Ye; Zhiyi Zhou; David C. Page; Murray H. Brilliant

HLA-DRB1 codes for a major histocompatibility complex class II cell surface receptor. Genetic variants in and around this gene have been linked to numerous autoimmune diseases. Most notably, an association between HLA-DRB1*1501 haplotype and multiple sclerosis (MS) has been defined. Utilizing electronic health records and 4235 individuals within Marshfield Clinic’s Personalized Medicine Research Project, a reverse genetic screen coined phenome-wide association study (PheWAS) tested association of rs3135388 genotype (tagging HLA-DRB1*1501) with 4841 phenotypes. As expected, HLA-DRB1*1501 was associated with MS (International Classification of Disease version 9-CM (ICD9) 340, P=0.023), whereas the strongest association was with alcohol-induced cirrhosis of the liver (ICD9 571.2, P=0.00011). HLA-DRB1*1501 also demonstrated association with erythematous conditions (ICD9 695, P=0.0054) and benign neoplasms of the respiratory and intrathoracic organs (ICD9 212, P=0.042), replicating previous findings. This study not only builds on the feasibility/utility of the PheWAS approach, represents the first external validation of a PheWAS, but may also demonstrate the complex etiologies associated with the HLA-DRB1*1501 loci.


Nature Biotechnology | 2015

Opportunities for drug repositioning from phenome-wide association studies

Majid Rastegar-Mojarad; Zhan Ye; Jill M Kolesar; Scott J. Hebbring; Simon Lin

VOLUME 33 NUMBER 4 APRIL 2015 NATURE BIOTECHNOLOGY main source of improvement, showing an average increase of 122% across all methods and algorithms (Fig. 1e,f). These results substantially reinforce the conclusions from our original paper2 and show that we can achieve a large improvement on a wide range of methods using our approach. A last claim of Bastiaens et al.1 is that our approximation based on correlation decay disagrees with biological reality. We concur that in certain cases a local perturbation may increase as it propagates along a network path, rather than decay. However, our application of the silencing method focused on statistical similarity measures, such as correlations, which always decrease along paths, and by definition cannot exceed unity. Moreover, even regarding perturbations, we argue that such amplification is not typical in biological networks. Indeed, if small perturbations were repeatedly amplified during their propagation, the implications on the stability and robustness of living cells would be dramatic; every local disturbance would lead to a macroscopic response and the modular nature of the cell’s functionality would be constantly distracted by the cross-talk between distant genes. Thus, it is not surprising that both theoretical and empirical analyses of cellular dynamics indicate, time and again, that the impact of perturbations is, in most cases, strictly local13. Studies have shown that perturbations typically feature an exponential decay as they penetrate the network14–18. Others have quantified the impact of perturbations by measuring cascade sizes, that is, the number of genes that exhibit a significant response following a perturbation. These reports find that most cascades are tiny and only rarely does a perturbation affect a substantial number of genes19–21. This paucity of large cascades further supports the notion that most perturbations do not penetrate deeply into the network. Finally, the premise of network inference relies on the notion that the magnitude of the terms in the prediction matrix Gij correlates with the likelihood of direct linkage6–9. If, as Bastiaens et al.1 suggest, there are cases where the Gij terms systematically increase with the distance between i and j, then in these cases Gij is a poor candidate for network inference in general, with or without silencing, and thus we would not consider it a suitable input for our method. To summarize, although we disagree with much of the criticism made by Bastiaens et al., we wish to thank them for raising several important issues and igniting a discussion that has ultimately led to the development of the improved silencing algorithm presented here.


Frontiers in Genetics | 2014

Genetic-based prediction of disease traits: Prediction is very difficult, especially about the future

Steven J. Schrodi; Shubhabrata Mukherjee; Ying Shan; Gerard Tromp; John J. Sninsky; Amy P. Callear; Tonia C. Carter; Zhan Ye; Jonathan L. Haines; Murray H. Brilliant; Paul K. Crane; Diane T. Smelser; Robert C. Elston; Daniel E. Weeks

Translation of results from genetic findings to inform medical practice is a highly anticipated goal of human genetics. The aim of this paper is to review and discuss the role of genetics in medically-relevant prediction. Germline genetics presages disease onset and therefore can contribute prognostic signals that augment laboratory tests and clinical features. As such, the impact of genetic-based predictive models on clinical decisions and therapy choice could be profound. However, given that (i) medical traits result from a complex interplay between genetic and environmental factors, (ii) the underlying genetic architectures for susceptibility to common diseases are not well-understood, and (iii) replicable susceptibility alleles, in combination, account for only a moderate amount of disease heritability, there are substantial challenges to constructing and implementing genetic risk prediction models with high utility. In spite of these challenges, concerted progress has continued in this area with an ongoing accumulation of studies that identify disease predisposing genotypes. Several statistical approaches with the aim of predicting disease have been published. Here we summarize the current state of disease susceptibility mapping and pharmacogenetics efforts for risk prediction, describe methods used to construct and evaluate genetic-based predictive models, and discuss applications.


European Journal of Human Genetics | 2015

Phenome-wide association studies (PheWASs) for functional variants

Zhan Ye; John E. Mayer; Lynn Ivacic; Zhiyi Zhou; Min He; Steven J. Schrodi; David C. Page; Murray H. Brilliant; Scott J. Hebbring

The genome-wide association study (GWAS) is a powerful approach for studying the genetic complexities of human disease. Unfortunately, GWASs often fail to identify clinically significant associations and describing function can be a challenge. GWAS is a phenotype-to-genotype approach. It is now possible to conduct a converse genotype-to-phenotype approach using extensive electronic medical records to define a phenome. This approach associates a single genetic variant with many phenotypes across the phenome and is called a phenome-wide association study (PheWAS). The majority of PheWASs conducted have focused on variants identified previously by GWASs. This approach has been efficient for rediscovering gene–disease associations while also identifying pleiotropic effects for some single-nucleotide polymorphisms (SNPs). However, the use of SNPs identified by GWAS in a PheWAS is limited by the inherent properties of the GWAS SNPs, including weak effect sizes and difficulty when translating discoveries to function. To address these challenges, we conducted a PheWAS on 105 presumed functional stop-gain and stop-loss variants genotyped on 4235 Marshfield Clinic patients. Associations were validated on an additional 10 640 Marshfield Clinic patients. PheWAS results indicate that a nonsense variant in ARMS2 (rs2736911) is associated with age-related macular degeneration (AMD). These results demonstrate that focusing on functional variants may be an effective approach when conducting a PheWAS.


Bioinformatics | 2015

Application of clinical text data for phenome-wide association studies (PheWASs)

Scott J. Hebbring; Majid Rastegar-Mojarad; Zhan Ye; John E. Mayer; Crystal Jacobson; Simon Lin

MOTIVATION Genome-wide association studies (GWASs) are effective for describing genetic complexities of common diseases. Phenome-wide association studies (PheWASs) offer an alternative and complementary approach to GWAS using data embedded in the electronic health record (EHR) to define the phenome. International Classification of Disease version 9 (ICD9) codes are used frequently to define the phenome, but using ICD9 codes alone misses other clinically relevant information from the EHR that can be used for PheWAS analyses and discovery. RESULTS As an alternative to ICD9 coding, a text-based phenome was defined by 23 384 clinically relevant terms extracted from Marshfield Clinics EHR. Five single nucleotide polymorphisms (SNPs) with known phenotypic associations were genotyped in 4235 individuals and associated across the text-based phenome. All five SNPs genotyped were associated with expected terms (P<0.02), most at or near the top of their respective PheWAS ranking. Raw association results indicate that text data performed equivalently to ICD9 coding and demonstrate the utility of information beyond ICD9 coding for application in PheWAS.


Frontiers in Genetics | 2014

Genome wide association study of SNP-, gene-, and pathway-based approaches to identify genes influencing susceptibility to Staphylococcus aureus infections

Zhan Ye; Daniel A. Vasco; Tonia C. Carter; Murray H. Brilliant; Steven J. Schrodi; Sanjay K. Shukla

Background: We conducted a genome-wide association study (GWAS) to identify specific genetic variants that underlie susceptibility to diseases caused by Staphylococcus aureus in humans. Methods: Cases (n = 309) and controls (n = 2925) were genotyped at 508,921 single nucleotide polymorphisms (SNPs). Cases had at least one laboratory and clinician confirmed disease caused by S. aureus whereas controls did not. R-package (for SNP association), EIGENSOFT (to estimate and adjust for population stratification) and gene- (VEGAS) and pathway-based (DAVID, PANTHER, and Ingenuity Pathway Analysis) analyses were performed. Results: No SNP reached genome-wide significance. Four SNPs exceeded the p < 10−5 threshold including two (rs2455012 and rs7152530) reaching a p-value < 10−7. The nearby genes were PDE4B (rs2455012), TXNRD2 (rs3804047), VRK1 and BCL11B (rs7152530), and PNPLA5 (rs470093). The top two findings from the gene-based analysis were NMRK2 (pgene = 1.20E-05), which codes an integrin binding molecule (focal adhesion), and DAPK3 (pgene = 5.10E-05), a serine/threonine kinase (apoptosis and cytokinesis). The pathway analyses identified epithelial cell responses to mechanical and non-mechanical stress. Conclusion: We identified potential susceptibility genes for S. aureus diseases in this preliminary study but confirmation by other studies is needed. The observed associations could be relevant given the complexity of S. aureus as a pathogen and its ability to exploit multiple biological pathways to cause infections in humans.


Journal of Medical Genetics | 2016

Phenome-wide association study maps new diseases to the human major histocompatibility complex region

Jixia Liu; Zhan Ye; John G. Mayer; Brian Hoch; Clayton Green; Loren A. Rolak; Christopher J. Cold; Seik-Soon Khor; Xiuwen Zheng; Taku Miyagawa; Katsushi Tokunaga; Murray H. Brilliant; Scott J. Hebbring

Background Over 160 disease phenotypes have been mapped to the major histocompatibility complex (MHC) region on chromosome 6 by genome-wide association study (GWAS), suggesting that the MHC region as a whole may be involved in the aetiology of many phenotypes, including unstudied diseases. The phenome-wide association study (PheWAS), a powerful and complementary approach to GWAS, has demonstrated its ability to discover and rediscover genetic associations. The objective of this study is to comprehensively investigate the MHC region by PheWAS to identify new phenotypes mapped to this genetically important region. Methods In the current study, we systematically explored the MHC region using PheWAS to associate 2692 MHC-linked variants (minor allele frequency ≥0.01) with 6221 phenotypes in a cohort of 7481 subjects from the Marshfield Clinic Personalized Medicine Research Project. Results Findings showed that expected associations previously identified by GWAS could be identified by PheWAS (eg, psoriasis, ankylosing spondylitis, type I diabetes and coeliac disease) with some having strong cross-phenotype associations potentially driven by pleiotropic effects. Importantly, novel associations with eight diseases not previously assessed by GWAS (eg, lichen planus) were also identified and replicated in an independent population. Many of these associated diseases appear to be immune-related disorders. Further assessment of these diseases in 16 484 Marshfield Clinic twins suggests that some of these diseases, including lichen planus, may have genetic aetiologies. Conclusions These results demonstrate that the PheWAS approach is a powerful and novel method to discover SNP–disease associations, and is ideal when characterising cross-phenotype associations, and further emphasise the importance of the MHC region in human health and disease.


JMIR Research Protocols | 2015

Collecting and Analyzing Patient Experiences of Health Care From Social Media

Majid Rastegar-Mojarad; Zhan Ye; Daniel Wall; Narayana Murali; Simon Lin

Background Social Media, such as Yelp, provides rich information of consumer experience. Previous studies suggest that Yelp can serve as a new source to study patient experience. However, the lack of a corpus of patient reviews causes a major bottleneck for applying computational techniques. Objective The objective of this study is to create a corpus of patient experience (COPE) and report descriptive statistics to characterize COPE. Methods Yelp reviews about health care-related businesses were extracted from the Yelp Academic Dataset. Natural language processing (NLP) tools were used to split reviews into sentences, extract noun phrases and adjectives from each sentence, and generate parse trees and dependency trees for each sentence. Sentiment analysis techniques and Hadoop were used to calculate a sentiment score of each sentence and for parallel processing, respectively. Results COPE contains 79,173 sentences from 6914 patient reviews of 985 health care facilities near 30 universities in the United States. We found that patients wrote longer reviews when they rated the facility poorly (1 or 2 stars). We demonstrated that the computed sentiment scores correlated well with consumer-generated ratings. A consumer vocabulary to describe their health care experience was constructed by a statistical analysis of word counts and co-occurrences in COPE. Conclusions A corpus called COPE was built as an initial step to utilize social media to understand patient experiences at health care facilities. The corpus is available to download and COPE can be used in future studies to extract knowledge of patients’ experiences from their perspectives. Such information can subsequently inform and provide opportunity to improve the quality of health care.


Science Translational Medicine | 2017

Phenome-wide scanning identifies multiple diseases and disease severity phenotypes associated with HLA variants

Jason H. Karnes; Christian M. Shaffer; Silvana Gaudieri; Yaomin Xu; Andrew M. Glazer; Jonathan D. Mosley; Shilin Zhao; Soumya Raychaudhuri; S. Mallal; Zhan Ye; John G. Mayer; Murray H. Brilliant; Scott J. Hebbring; Dan M. Roden; E. Phillips; Joshua C. Denny

Numerous associations were discovered between human leukocyte antigen (HLA) variation and a comprehensive set of phenotypes derived from electronic health records. Hints on health and disease from HLA Each of us expresses a mix of different human leukocyte antigens (HLAs), which present self- and foreign peptides to T cells. Because slightly different peptides are presented by each HLA type, HLA expression can influence an individual’s susceptibility to disease. Karnes et al. scrutinized electronic health record information from tens of thousands of people in two distinct cohorts to compare their phenotypes to the HLA alleles they express. This study confirmed previously identified HLA associations and also identified new ones; most associations were related to autoimmune diseases. The researchers have made the catalog freely available so that other groups can mine the data for future discoveries about how HLAs drive different phenotypes. Although many phenotypes have been associated with variants in human leukocyte antigen (HLA) genes, the full phenotypic impact of HLA variants across all diseases is unknown. We imputed HLA genomic variation from two populations of 28,839 and 8431 European ancestry individuals and tested association of HLA variation with 1368 phenotypes. A total of 104 four-digit and 92 two-digit HLA allele phenotype associations were significant in both discovery and replication cohorts, the strongest being HLA-DQB1*03:02 and type 1 diabetes. Four previously unidentified associations were identified across the spectrum of disease with two- and four-digit HLA alleles and 10 with nonsynonymous variants. Some conditions associated with multiple HLA variants and stronger associations with more severe disease manifestations were identified. A comprehensive, publicly available catalog of clinical phenotypes associated with HLA variation is provided. Examining HLA variant disease associations in this large data set allows comprehensive definition of disease associations to drive further mechanistic insights.


Journal of Medical Genetics | 2015

SeqHBase: a big data toolset for family based sequencing data analysis

Min He; Thomas N. Person; Scott J. Hebbring; Ethan Heinzen; Zhan Ye; Steven J. Schrodi; Elizabeth McPherson; Simon M. Lin; Peggy L. Peissig; Murray H. Brilliant; Jason O'Rawe; Reid J. Robison; Gholson J. Lyon; Kai Wang

Background Whole-genome sequencing (WGS) and whole-exome sequencing (WES) technologies are increasingly used to identify disease-contributing mutations in human genomic studies. It can be a significant challenge to process such data, especially when a large family or cohort is sequenced. Our objective was to develop a big data toolset to efficiently manipulate genome-wide variants, functional annotations and coverage, together with conducting family based sequencing data analysis. Methods Hadoop is a framework for reliable, scalable, distributed processing of large data sets using MapReduce programming models. Based on Hadoop and HBase, we developed SeqHBase, a big data-based toolset for analysing family based sequencing data to detect de novo, inherited homozygous, or compound heterozygous mutations that may contribute to disease manifestations. SeqHBase takes as input BAM files (for coverage at every site), variant call format (VCF) files (for variant calls) and functional annotations (for variant prioritisation). Results We applied SeqHBase to a 5-member nuclear family and a 10-member 3-generation family with WGS data, as well as a 4-member nuclear family with WES data. Analysis times were almost linearly scalable with number of data nodes. With 20 data nodes, SeqHBase took about 5 secs to analyse WES familial data and approximately 1 min to analyse WGS familial data. Conclusions These results demonstrate SeqHBases high efficiency and scalability, which is necessary as WGS and WES are rapidly becoming standard methods to study the genetics of familial disorders.

Collaboration


Dive into the Zhan Ye's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

John E. Mayer

Boston Children's Hospital

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David C. Page

University of Wisconsin-Madison

View shared research outputs
Researchain Logo
Decentralizing Knowledge