Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jiayin Wang is active.

Publication


Featured researches published by Jiayin Wang.


BMC Genomics | 2017

An improved burden-test pipeline for identifying associations from rare germline and somatic variants

Yu Geng; Zhongmeng Zhao; Xuanping Zhang; Wenke Wang; Xingjian Cui; Kai Ye; Xiao Xiao; Jiayin Wang

BackgroundIdentifying rare germline and somatic variants associated with cancer progression is an important research topic in cancer genomics. Although many approaches are proposed for rare variant association study, they are not fit for cancer sequencing data due to multiple issues, such as overly relying on pre-selection, losing sight of interacting hotspots, etc.ResultsIn this article, we propose an improved pipeline to identify germline variant and somatic mutation interactions influencing cancer susceptibility from pair-wise cancer sequencing data. The proposed pipeline, RareProb-C performs an algorithmic selection on the given variants by incorporating the variant allelic frequencies. The interactions among the variants are considered within the regions which are limited by a four-gamete test. Then it filters singular cases according to the posterior probability at each site. Finally, it outputs the selected candidates that pass a collapse test.ConclusionsWe apply RareProb-C on a series of carefully constructed simulation cases and it outperforms six existing genetic model-free approaches. We also test RareProb-C on 429 TCGA ovarian cancer cases, and RareProb-C successfully identifies the known highlighted variants which are considered increasing disease susceptibilities.


international conference on intelligent computing | 2017

Identifying Heterogeneity Patterns of Allelic Imbalance on Germline Variants to Infer Clonal Architecture

Yu Geng; Zhongmeng Zhao; Jing Xu; Ruoyu Liu; Yi Huang; Xuanping Zhang; Xiao Xiao; Maomao; Jiayin Wang

It is suggested that the evolution of somatic mutations may be significant impacted by inherited polymorphisms, while the clonal somatic copy-number mutations may contribute to the potential selective advantages of heterozygous germline variants. A fine resolution on clonal architecture of such cooperative germline-somatic dynamics provides insight into tumour heterogeneity and offers clinical implications. Although it is reported that germline allelic imbalance patterns often play important roles, existing approaches for clonal analysis mainly focus on single nucleotide sites. To address this need, we propose a computational method, GLClone that identifies and estimates the clonal patterns of the copy-number alterations on germline variants. The core of GLClone is a hierarchical probabilistic model. The variant allelic frequencies on germline variants are modeled as observed variables, while the cellular prevalence is designed as hidden states and estimated by Bayesian posteriors. A variational approximation algorithm is proposed to train the model and estimate the unknown variables and model parameters. We examine GLClone on several groups of simulation datasets, which are generated by different configurations, and compare to three popular state-of-the-art approaches, and GLClone outperforms on accuracy, especially a complex clonal structure exists.


international conference on intelligent computing | 2017

Accurately Estimating Tumor Purity of Samples with High Degree of Heterogeneity from Cancer Sequencing Data

Yu Geng; Zhongmeng Zhao; Ruoyu Liu; Tian Zheng; Jing Xu; Yi Huang; Xuanping Zhang; Xiao Xiao; Jiayin Wang

Tumor purity is the proportion of tumor cells in the sampled admixture. Estimating tumor purity is one of the key steps for both understanding the tumor micro-environment and reducing false positives and false negatives in the genomic analysis. However, existing approaches often lose some accuracy when analyzing the samples with high degree of heterogeneity. The patterns of clonal architecture shown in sequencing data interfere with the data signals that the purity estimation algorithms expect. In this article, we propose a computational method, EMPurity, which is able to accurately infer the tumor purity of the samples with high degree of heterogeneity. EMPurity captures the patterns of both the tumor purity and clonal structure by a probabilistic model. The model parameters are directly calculated from aligned reads, which prevents the errors transferring from the variant calling results. We test EMPurity on a series of datasets comparing to three popular approaches, and EMPurity outperforms them on different simulation configurations.


international conference on bioinformatics and biomedical engineering | 2017

An Expanded Association Approach for Rare Germline Variants with Copy-Number Alternation

Yu Geng; Zhongmeng Zhao; Daibin Cui; Tian Zheng; Xuanping Zhang; Xiao Xiao; Jiayin Wang

Tumorigenesis is considered as a complex process that is often driven by close interactions between germline variants and accumulated somatic mutational events. Recent studies report that some somatic copy-number alternations show such interactions by harboring germline susceptibility variants under potential selection in clonal expansions. Incorporating these interactions into genetic association approach could be valuable in not only discovering novel susceptibility variants, but providing insight into tumor heterogeneity and clinical implications. To address this need, in this article, we propose RareProb-G, an expanded version of a computational method, which is designed for identifying rare germline susceptibility variants located in the somatic allelic amplification or loss of heterozygosity regions. RareProb-G is based on a hidden Markov random field model. The interactions among germline variants and somatic events are modeled by a neighborhood system, which is bounded by a t-test on variant allelic frequencies. Each variant is assigned four hidden states, which represent the regional status and causal/neutral status, respectively. A hidden Markov model is also introduced to estimate the initial values of the hidden states and unknown model parameters. To verify this approach, we conduct a series of simulation experiments under different configurations, and RareProb-G outperforms than RareProb on both sensitivity and specificity.


international conference on intelligent computing | 2018

TNSim: A Tumor Sequencing Data Simulator for Incorporating Clonality Information

Yu Geng; Zhongmeng Zhao; Mingzhe Xu; Xuanping Zhang; Xiao Xiao; Jiayin Wang

In recent years, the next generation sequencing enables us to obtain high resolution landscapes of the genetic changes at single-nucleotide level. More and more novel methods are proposed for efficient and effective analyses on cancer sequencing data. To facilitate such development, data simulator is a crucial tool, which not only tests and evaluates proposed approaches, but provides the feedbacks for further improvements as well. Several simulators are released to generate the next generation sequencing data. However, based on our best knowledge, none of them considers clonality information. It is suggested that clonal heterogeneity does widely exist in tumor samples. The patterns of somatic mutational events usually expose a wide spectrum of variant allelic frequencies, while some of them are only detectable in one or multiple clonal lineages. In this article, we introduce a Tumor-Normal sequencing Simulator, TNSim, to generate the next generation sequencing data by involving clonality information. The simulator is able to mimic a tumor sample and the paired normal sample, where the germline variants and somatic mutations can be settled respectively. Tumor purity is adjustable. Clonal architecture is preassigned as one or more clonal lineages, where each lineage consists of a set of somatic mutations whose variant allelic frequencies are similar. A group of experiments are conducted to evaluate its performance. The statistical features of the artificial sequencing reads are comparable to the real tumor sequencing data whose sample consists of multiple sub-clones. The source codes are available at http://github.com/lnmxgy/TNSim and for academic use only.


international conference on bioinformatics and biomedical engineering | 2018

Estimating the Length Distributions of Genomic Micro-satellites from Next Generation Sequencing Data

Xuan Feng; Huan Hu; Zhongmeng Zhao; Xuanping Zhang; Jiayin Wang

Genomic micro-satellites are the genomic regions that consist of short and repetitive DNA motifs. In contrast to unique genome, genomic micro-satellites expose high intrinsic polymorphisms, which mainly derive from variability in length. Length distributions are widely used to represent the polymorphisms. Recent studies report that some micro-satellites alter their length distributions significantly in tumor tissue samples comparing to the ones observed in normal samples, which becomes a hot topic in cancer genomics. Several state-of-the-art approaches are proposed to identify the length distributions from the sequencing data. However, the existing approaches can only handle the micro-satellites shorter than one read length, which limits the potential research on long micro-satellite events. In this article, we propose a probabilistic approach, implemented as ELMSI that estimates the length distributions of the micro-satellites longer than one read length. The core algorithm works on a set of mapped reads. It first clusters the reads, and a k-mer extension algorithm is adopted to detect the unit and breakpoints as well. Then, it conducts an expectation maximization algorithm to approach the true length distributions. According to the experiments, ELMSI is able to handle micro-satellites with the length spectrum from shorter than one read length to 10 kbps scale. A series of comparison experiments are applied, which vary the numbers of micro-satellite regions, read lengths and sequencing coverages, and ELMSI outperforms MSIsensor in most of the cases.


international conference on bioinformatics and biomedical engineering | 2018

CIGenotyper: A Machine Learning Approach for Genotyping Complex Indel Calls

Tian Zheng; Yang Li; Yu Geng; Zhongmeng Zhao; Xuanping Zhang; Xiao Xiao; Jiayin Wang

Complex insertion and deletion (complex indel) is a rare category of genomic structural variations. A complex indel presents as one or multiple DNA fragments inserted into the genomic location where a deletion occurs. Several studies emphasize the importance of complex indels, and some state-of-the-art approaches are proposed to detect them from sequencing data. However, genotyping complex indel calls is another challenged computational problem because some commonly used features for genotyping indel calls from the sequencing data could be invalid due to the components of complex indels. Thus, in this article, we propose a machine learning approach, CIGenotyper to estimate genotypes of complex indel calls. CIGenotyper adopts a relevance vector machine (RVM) framework. For each candidate call, it first extracts a set of features from the candidate region, which usually includes the read depth, the variant allelic frequency for aligned contigs, the numbers of the splitting and discordant paired-end reads, etc. For a complex indel call, given its features to a trained RVM, the model outputs the genotype with highest likelihood. An algorithm is also proposed to train the RVM. We compare our approach to two popular approaches, Gindel and Pindel, on multiple groups of artificial datasets. The results of our model outperforms them on average success rates in most of the cases when vary the coverages of the given data, the read lengths and the distributions of the lengths of the pre-set complex indels.


Molecules | 2018

Synstable Fusion: A Network-Based Algorithm for Estimating Driver Genes in Fusion Structures

Mingzhe Xu; Zhongmeng Zhao; Xuanping Zhang; Aiqing Gao; Shuyan Wu; Jiayin Wang

Gene fusion structure is a class of common somatic mutational events in cancer genomes, which are often formed by chromosomal mutations. Identifying the driver gene(s) in a fusion structure is important for many downstream analyses and it contributes to clinical practices. Existing computational approaches have prioritized the importance of oncogenes by incorporating prior knowledge from gene networks. However, different methods sometimes suffer different weaknesses when handling gene fusion data due to multiple issues such as fusion gene representation, network integration, and the effectiveness of the evaluation algorithms. In this paper, Synstable Fusion (SYN), an algorithm for computationally evaluating the fusion genes, is proposed. This algorithm uses network-based strategy by incorporating gene networks as prior information, but estimates the driver genes according to the destructiveness hypothesis. This hypothesis balances the two popular evaluation strategies in the existing studies, thereby providing more comprehensive results. A machine learning framework is introduced to integrate multiple networks and further solve the conflicting results from different networks. In addition, a synchronous stability model is established to reduce the computational complexity of the evaluation algorithm. To evaluate the proposed algorithm, we conduct a series of experiments on both artificial and real datasets. The results demonstrate that the proposed algorithm performs well on different configurations and is robust when altering the internal parameter settings.


international conference on intelligent computing | 2017

A Fast Optimization Algorithm for K-Coverage Problem.

Jingwen Pei; Maomao; Jiayin Wang

K-coverage optimization is widely used in healthcare environments, which minimizing the number of directional receivers that guarantee a given region is covered k times. As K-coverage optimization is NP-hard, a commonly used approach for finding the optimal solution is integer linear programming. However, it can be slow for many practical instances. In this article, we propose an exact dynamic programming algorithm based on tree-decomposition. A probability-distribution density function is introduced to describe the relative importance of various areas and compute the minimal optimized sub-structures. We also show that this algorithm can be easily extended to provide exact solutions for different coverage ratio requirements. When compared with the ILP approach and greedy algorithm, our algorithm provides accurate solutions without scarifying too much on efficiency.


international conference on intelligent computing | 2017

An Ant-Colony Based Approach for Identifying a Minimal Set of Rare Variants Underlying Complex Traits.

Xuanping Zhang; Zhongmeng Zhao; Yan Chang; Aiyuan Yang; Yixuan Wang; Ruoyu Liu; Maomao; Xiao Xiao; Jiayin Wang

Identifying the associations between genetic variants and observed traits is one of the basic problems in genomics. Existing association approaches mainly adopt the collapsing strategy for rare variants. However, these approaches largely rely on the quality of variant selection, and lose statistical power if neutral variants are collapsed together. To overcome the weaknesses, in this article, we propose a novel association approach that aims to obtain a minimal set of candidate variants. This approach incorporates an ant-colony optimization into a collapsing model. Several classes of ants are designed, and each class is assigned to one particular interval in the solution space. An ant prefers to build optimal solution on the region assigned, while it communicates with others and votes for a small number of locally optimal solutions. This framework improves the performance on searching globally optimal solutions. We conduct multiple groups of experiments on semi-simulated datasets with different configurations. The results outperform three popular approaches on both increasing the statistical powers and decreasing the type-I and II errors.

Collaboration


Dive into the Jiayin Wang's collaboration.

Top Co-Authors

Avatar

Xuanping Zhang

Xi'an Jiaotong University

View shared research outputs
Top Co-Authors

Avatar

Zhongmeng Zhao

Xi'an Jiaotong University

View shared research outputs
Top Co-Authors

Avatar

Yu Geng

Xi'an Jiaotong University

View shared research outputs
Top Co-Authors

Avatar

Xiao Xiao

Xi'an Jiaotong University

View shared research outputs
Top Co-Authors

Avatar

Tian Zheng

Xi'an Jiaotong University

View shared research outputs
Top Co-Authors

Avatar

Yi Huang

Xi'an Jiaotong University

View shared research outputs
Top Co-Authors

Avatar

Yixuan Wang

Xi'an Jiaotong University

View shared research outputs
Top Co-Authors

Avatar

Mingzhe Xu

Xi'an Jiaotong University

View shared research outputs
Top Co-Authors

Avatar

Rong Zhang

Xi'an Jiaotong University

View shared research outputs
Top Co-Authors

Avatar

Ruoyu Liu

Xi'an Jiaotong University

View shared research outputs
Researchain Logo
Decentralizing Knowledge