Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xu Shi is active.

Publication


Featured researches published by Xu Shi.


BMC Genomics | 2015

BMRF-MI: integrative identification of protein interaction network by modeling the gene dependency

Xu Shi; Xiao Wang; Ayesha N. Shajahan; Leena Hilakivi-Clarke; Robert Clarke; Jianhua Xuan

BackgroundIdentification of protein interaction network is a very important step for understanding the molecular mechanisms in cancer. Several methods have been developed to integrate protein-protein interaction (PPI) data with gene expression data for network identification. However, they often fail to model the dependency between genes in the network, which makes many important genes, especially the upstream genes, unidentified. It is necessary to develop a method to improve the network identification performance by incorporating the dependency between genes.ResultsWe proposed an approach for identifying protein interaction network by incorporating mutual information (MI) into a Markov random field (MRF) based framework to model the dependency between genes. MI is widely used in information theory to measure the uncertainty between random variables. Different from traditional Pearson correlation test, MI is capable of capturing both linear and non-linear relationship between random variables. Among all the existing MI estimators, we choose to use k-nearest neighbor MI (kNN-MI) estimator which is proved to have minimum bias. The estimated MI is integrated with an MRF framework to model the gene dependency in the context of network. The maximum a posterior (MAP) estimation is applied on the MRF-based model to estimate the network score. In order to reduce the computational complexity of finding the optimal network, a probabilistic searching algorithm is implemented. We further increase the robustness and reproducibility of the results by applying a non-parametric bootstrapping method to measure the confidence level of the identified genes. To evaluate the performance of the proposed method, we test the method on simulation data under different conditions. The experimental results show an improved accuracy in terms of subnetwork identification compared to existing methods. Furthermore, we applied our method onto real breast cancer patient data; the identified protein interaction network shows a close association with the recurrence of breast cancer, which is supported by functional annotation. We also show that the identified subnetworks can be used to predict the recurrence status of cancer patients by survival analysis.ConclusionsWe have developed an integrated approach for protein interaction network identification, which combines Markov random field framework and mutual information to model the gene dependency in PPI network. Improvements in subnetwork identification have been demonstrated with simulation datasets compared to existing methods. We then apply our method onto breast cancer patient data to identify recurrence related subnetworks. The experiment results show that the identified genes are enriched in the pathway and functional categories relevant to progression and recurrence of breast cancer. Finally, the survival analysis based on identified subnetworks achieves a good result of classifying the recurrence status of cancer patients.


BMC Systems Biology | 2013

mAPC-GibbsOS: an integrated approach for robust identification of gene regulatory networks

Xu Shi; Jinghua Gu; Xi Chen; Ayesha N. Shajahan; Leena Hilakivi-Clarke; Robert Clarke; Jianhua Xuan

BackgroundIdentification of cooperative gene regulatory network is an important topic for biological study especially in cancer research. Traditional approaches suffer from large noise in gene expression data and false positive connections in motif binding data; they also fail to identify the modularized structure of gene regulatory network. Methods that are capable of revealing underlying modularized structure and robust to noise and false positives are needed to be developed.ResultsWe proposed and developed an integrated approach to identify gene regulatory networks, which consists of a novel clustering method (namely motif-guided affinity propagation clustering (mAPC)) and a sampling based method (called Gibbs sampler based on outlier sum statistic (GibbsOS)). mAPC is used in the first step to obtain co-regulated gene modules by clustering genes with a similarity measurement taking into account both gene expression data and binding motif information. This clustering method can reduce the noise effect from microarray data to obtain modularized gene clusters. However, due to many false positives in motif binding data, some genes not regulated by certain transcription factors (TFs) will be falsely clustered with true target genes. To overcome this problem, GibbsOS is applied in the second step to refine each cluster for the identification of true target genes. In order to evaluate the performance of the proposed method, we generated simulation data under different signal-to-noise ratios and false positive ratios to test the method. The experimental results show an improved accuracy in terms of clustering and transcription factor identification. Moreover, an improved performance is demonstrated in target gene identification as compared with GibbsOS. Finally, we applied the proposed method to two breast cancer patient datasets to identify cooperative transcriptional regulatory networks associated with recurrence of breast cancer, as supported by their functional annotations.ConclusionsWe have developed a two-step approach for gene regulatory network identification, featuring an integrated method to identify modularized regulatory structures and refine their target genes subsequently. Simulation studies have shown the robustness of the method against noise in gene expression data and false positives in motif binding data. The proposed method has been applied to two breast cancer gene expression datasets to infer the hidden regulation mechanisms. The experimental results demonstrate the efficacy of the method in identifying key regulatory networks related to the progression and recurrence of breast cancer.


Bioinformatics | 2015

BMRF-Net: a software tool for identification of protein interaction subnetworks by a bagging Markov random field-based method

Xu Shi; Robert O. Barnes; Li Chen; Ayesha N. Shajahan-Haq; Leena Hilakivi-Clarke; Robert Clarke; Yue Joseph Wang; Jianhua Xuan

UNLABELLED Identification of protein interaction subnetworks is an important step to help us understand complex molecular mechanisms in cancer. In this paper, we develop a BMRF-Net package, implemented in Java and C++, to identify protein interaction subnetworks based on a bagging Markov random field (BMRF) framework. By integrating gene expression data and protein-protein interaction data, this software tool can be used to identify biologically meaningful subnetworks. A user friendly graphic user interface is developed as a Cytoscape plugin for the BMRF-Net software to deal with the input/output interface. The detailed structure of the identified networks can be visualized in Cytoscape conveniently. The BMRF-Net package has been applied to breast cancer data to identify significant subnetworks related to breast cancer recurrence. AVAILABILITY AND IMPLEMENTATION The BMRF-Net package is available at http://sourceforge.net/projects/bmrfcjava/. The package is tested under Ubuntu 12.04 (64-bit), Java 7, glibc 2.15 and Cytoscape 3.1.0.


PLOS ONE | 2017

CyNetSVM: A Cytoscape App for Cancer Biomarker Identification Using Network Constrained Support Vector Machines

Xu Shi; Sharmi Banerjee; Li Chen; Leena Hilakivi-Clarke; Robert Clarke; Jianhua Xuan

One of the important tasks in cancer research is to identify biomarkers and build classification models for clinical outcome prediction. In this paper, we develop a CyNetSVM software package, implemented in Java and integrated with Cytoscape as an app, to identify network biomarkers using network-constrained support vector machines (NetSVM). The Cytoscape app of NetSVM is specifically designed to improve the usability of NetSVM with the following enhancements: (1) user-friendly graphical user interface (GUI), (2) computationally efficient core program and (3) convenient network visualization capability. The CyNetSVM app has been used to analyze breast cancer data to identify network genes associated with breast cancer recurrence. The biological function of these network genes is enriched in signaling pathways associated with breast cancer progression, showing the effectiveness of CyNetSVM for cancer biomarker identification. The CyNetSVM package is available at Cytoscape App Store and http://sourceforge.net/projects/netsvmjava; a sample data set is also provided at sourceforge.net.


Bioinformatics | 2017

PSSV: a novel pattern-based probabilistic approach for somatic structural variation identification

Xi Chen; Xu Shi; Leena Hilakivi-Clarke; Ayesha N. Shajahan-Haq; Robert Clarke; Jianhua Xuan

Motivation: Whole genome DNA-sequencing (WGS) of paired tumor and normal samples has enabled the identification of somatic DNA changes in an unprecedented detail. Large-scale identification of somatic structural variations (SVs) for a specific cancer type will deepen our understanding of driver mechanisms in cancer progression. However, the limited number of WGS samples, insufficient read coverage, and the impurity of tumor samples that contain normal and neoplastic cells, limit reliable and accurate detection of somatic SVs. Results: We present a novel pattern-based probabilistic approach, PSSV, to identify somatic structural variations from WGS data. PSSV features a mixture model with hidden states representing different mutation patterns; PSSV can thus differentiate heterozygous and homozygous SVs in each sample, enabling the identification of those somatic SVs with heterozygous mutations in normal samples and homozygous mutations in tumor samples. Simulation studies demonstrate that PSSV outperforms existing tools. PSSV has been successfully applied to breast cancer data to identify somatic SVs of key factors associated with breast cancer development. Availability and Implementation: An R package of PSSV is available at http://www.cbil.ece.vt.edu/software.htm. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


bioRxiv | 2018

ChIP-BIT2: a software tool to detect weak binding events using a Bayesian integration approach

Xi Chen; Xu Shi; Leena Hilakivi-Clarke; Robert Clarke; Tian Li Wang; Jianhua Xuan

Transcription factor binding events play important functional roles in gene regulation. It is, however, a challenging task to detect weak binding events since the ambiguity in differentiation of weak binding signals from background signals. We present a software package, ChIP-BIT2, to identify weak binding events using a Bayesian integration approach. By integrating signals from sample and input ChIP-seq data, ChIP-BIT2 can detect both strong and weak binding events at gene promoter, enhancer or the whole genome effectively. The ChIP-BIT2 package has been extensively tested on ChIP-seq data, demonstrating its wide applicability in ChIP-seq data analysis. Availability and Implementation The ChIP-BIT2 package is available at http://sourceforge.net/projects/chipbitc/.


Bioinformatics | 2018

SparseIso: a novel Bayesian approach to identify alternatively spliced isoforms from RNA-seq data

Xu Shi; Xiao Wang; Tian Li Wang; Leena Hilakivi-Clarke; Robert Clarke; Jianhua Xuan

Motivation Recent advances in high-throughput RNA sequencing (RNA-seq) technologies have made it possible to reconstruct the full transcriptome of various types of cells. It is important to accurately assemble transcripts or identify isoforms for an improved understanding of molecular mechanisms in biological systems. Results We have developed a novel Bayesian method, SparseIso, to reliably identify spliced isoforms from RNA-seq data. A spike-and-slab prior is incorporated into the Bayesian model to enforce the sparsity for isoform identification, effectively alleviating the problem of overfitting. A Gibbs sampling procedure is further developed to simultaneously identify and quantify transcripts from RNA-seq data. With the sampling approach, SparseIso estimates the joint distribution of all candidate transcripts, resulting in a significantly improved performance in detecting lowly expressed transcripts and multiple expressed isoforms of genes. Both simulation study and real data analysis have demonstrated that the proposed SparseIso method significantly outperforms existing methods for improved transcript assembly and isoform identification. Availability and implementation The SparseIso package is available at http://github.com/henryxushi/SparseIso. Contact [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.


international conference of the ieee engineering in medicine and biology society | 2014

BSSV: Bayesian based somatic structural variation identification with whole genome DNA-seq data

Xi Chen; Xu Shi; Ayesha N. Shajahan; Leena Hilakivi-Clarke; Robert Clarke; Jianhua Xuan

High coverage whole genome DNA-sequencing enables identification of somatic structural variation (SSV) more evident in paired tumor and normal samples. Recent studies show that simultaneous analysis of paired samples provides a better resolution of SSV detection than subtracting shared SVs. However, available tools can neither identify all types of SSVs nor provide any rank information regarding their somatic features. In this paper, we have developed a Bayesian framework, by integrating read alignment information from both tumor and normal samples, called BSSV, to calculate the significance of each SSV. Tested by simulated data, the precision of BSSV is comparable to that of available tools and the false negative rate is significantly lowered. We have also applied this approach to The Cancer Genome Atlas breast cancer data for SSV detection. Many known breast cancer specific mutated genes like RAD51, BRIP1, ER, PGR and PTPRD have been successfully identified.


bioinformatics and biomedicine | 2013

A novel statistical approach to identify co-regulatory gene modules

Xi Chen; Jianhua Xuan; Xu Shi; Ayesha N. Shajahan-Haq; Leena Hilakivi-Clarke; Robert Clarke

ChlP-chip experiments are performed to determine binding sites for transcription factors (TFs). Conventional TF-gene regulation is generated based on p-value cutoff of the binding sites as well as their distance to nearest genes. Taking into account that binding sites of one ChlP-chip experiment should follow the same specific location distribution, we proposed a statistical model using both location and significance information to weigh target genes. With multiple ChlP-chip experiments and gene expression data, we identified co-regulatory and differentially expressed gene modules with a joint clustering and Metropolis sampling approach. We demonstrated the efficiency of our method on a ChlP-chip data set with 38 breast cancer related TFs.


international conference on bioinformatics | 2014

Statistical Identification of Co-regulatory Gene Modules using Multiple ChIP-Seq Experiments

Xi Chen; Xu Shi; Ayesha N. Shajahan-Haq; Leena Hilakivi-Clarke; Robert Clarke; Jianhua Xuan

Collaboration


Dive into the Xu Shi's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Robert Clarke

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ayesha N. Shajahan-Haq

Georgetown University Medical Center

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tian Li Wang

Johns Hopkins University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge