Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Baoju Zhang is active.

Publication


Featured researches published by Baoju Zhang.


international conference of the ieee engineering in medicine and biology society | 2013

Sparse generalized canonical correlation analysis for biological model integration: A genetic study of psychiatric disorders

Mingon Kang; Baoju Zhang; Xiaoyong Wu; Chunyu Liu; Jean Gao

In the post-genomic era, unveiling causal traits in the complex mechanisms that involve a number of diseases has been highlighted as one of the key goals. Much research has recently suggested integrative approaches of both genomewide association studies (GWAS) and gene expression profiling-based studies provide greater insight of the mechanism than utilizing only one. In this paper, we propose a novel method, sparse generalized canonical correlation analysis (SGCCA), to integrate multiple biological data such as genetic markers, gene expressions, and disease phenotypes. The proposed method provides a powerful approach to comprehensively analyze complex biological mechanism while utilizing the multiple data simultaneously. The new method is also designed to identify a few of the elements significantly involved in the system among a large number of elements within the variable sets. The advantage of the method as well lies in the output of easily interpretable solutions. To verify the performance of SGCCA, we performed experiments with simulation data and human brain data of psychiatric diseases. Its capability to detect significant elements of the sets and the relations of the complex system is assessed.


Network Modeling Analysis in Health Informatics and BioInformatics | 2015

Inferring disease associations of the long non-coding RNAs through non-negative matrix factorization

Ashis Kumer Biswas; Mingon Kang; Dong Chul Kim; Chris H. Q. Ding; Baoju Zhang; Xiaoyong Wu; Jean Gao

Long non-coding RNAs (lncRNAs) have been implicated in various biological processes, and are linked in many dysregulations. Over the past decade, researchers reported a large number of human disease associations with the lncRNAs, both intergenic lncRNAs (lincRNAs) and non-intergenic lncRNAs. Thanks to the next generation sequencing platform, RNA-seq, through which researchers also were able to quantify expression profiles of each of the lncRNAs in human tissue samples. In this article we adapted the non-negative matrix factorization method to develop a low-rank computational model that can describe the existing knowledge about both non-intergenic and intergenic lncRNA-disease associations represented in a two dimensional association matrix as well as convey a way of ranking disease causing lncRNAs. We proposed several NMF formulations for the problem and we found that the sparsity-constrained NMF obtained the best model among all the other models. By exploiting the inherent bi-clustering ability of the NMF models, we extracted several lncRNA groups and disease groups that possess biological significance. Moreover, we proposed an integrative NMF formulation where we incorporated along with the coding gene and lincRNA disease association data, prior knowledge about relationship networks among the coding genes and lincRNAs, and the RNA-seq expression profile data to identify potential lincRNA-coding gene co-modules with which we further enhanced the lincRNA-disease associations and untangled mysteries about functional chemistry of the intergenic lncRNAs. Experimental results show the superiority of our proposed method over two state-of-the-art clustering algorithms—k-means and hierarchical clustering.


international conference on communications | 2014

A Unified Probabilistic PLSR Model for Quantitative Analysis of Surface-Enhanced Raman Spectrum (SERS)

Shuo Li; Jean Gao; James O. Nyagilo; Digant P. Dave; Baoju Zhang; Xiaoyong Wu

Gold Surface-enhanced Raman Scattering (Au SERS) nano-particles in combination with Raman spectroscopy have occurred as a newly sensitive, non-invasive molecular imaging technology. The multiplexing capability enables the technology to detect and separate multiple biomarkers with picomolar sensitivity. In this study, we demonstrate the ability of Raman spectroscopy to separate different fingerprints of Au SERS nanotags. Quantitative analysis of Raman spectrum data usually faces the challenge as high dimensional variables with a low sample number. The commonly applied partial least squares (PLS) regression algorithms, including PLS2 and SIMPLS, can not avoid overfitting to small data sets. In this paper, we present a unified probabilistic PLSR model, called PPLSR, stemmed from the concepts of probabilistic principal component analysis (PPCA) and probabilistic canonical correlation analysis (PCCA) to identify the spectral fingerprints from the measured mixing Raman signals. This model partitions the observed variables into the systematic part governed by a few latent variables and the unrelated noise part controlling the uncertainty of data sets. As a general methodology, this provides a solid foundation to develop Bayesian nonparametrics models and helps to build more robust models. Experimental results of Raman spectrum data using up to five different types of Au SERS nanotags with different combinations and mixing ratios are shown. Quantitative analysis using the proposed model and comparison methods are given with two cross-validation methods.


bioinformatics and bioengineering | 2014

Integration of DNA Methylation, Copy Number Variation, and Gene Expression for Gene Regulatory Network Inference and Application to Psychiatric Disorders

Dong Chul Kim; Mingon Kang; Baoju Zhang; Xiaoyong Wu; Chunyu Liu; Jean Gao

Biological network inference is a crucial problem to solve in Bioinformatics as most of biological process are based on bio molecular interactions. Many researchers have worked on especially the inference of gene regulatory networks where a node and edge represent a gene and regulation relationship respectively assuming that a gene can regulate another gene indirectly. However, a gene expression level can be influenced by not only genes and proteins but also other biological factors. Therefore, the inference could be more effective if those factors are considered in gene regulatory network inferences. In this paper, we propose an integrative approach to infer gene regulatory networks where a gene can be regulated by not only gene and but also DNA Methylation and copy number variation. It is assumed that a gene can be directly regulated by a single DNA Methylation and copy number variation at most. The simulation results show that our method outperforms popular and state-of-the-art methods of biological network inference. In addition, we applied the proposed method to psychiatric disorder data. The inferred networks provide the relationships within a set of genes that are more likely to be regulated by DNA Methylation and copy number variation of the genes.


Journal of Bioinformatics and Computational Biology | 2013

CNCTDiscriminator: Coding and noncoding transcript discriminator - An excursion through hypothesis learning and ensemble learning approaches

Ashis Kumer Biswas; Baoju Zhang; Xiaoyong Wu; Jean Gao

The statistics about the open reading frames, the base compositions and the properties of the predicted secondary structures have potential to address the problem of discriminating coding and noncoding transcripts. Again, the Next Generation Sequencing platform, RNA-seq, provides us bounty of data from which expression profiles of the transcripts can be extracted which urged us adding a new set of dimension in this classification task. In this paper, we proposed CNCTDiscriminator -- a coding and noncoding transcript discriminating system where we applied the integration of these four categories of features about the transcripts. The feature integration was done using both hypothesis learning and feature specific ensemble learning approaches. The CNCTDiscriminator model which was trained with composition and ORF features outperforms (precision 83.86%, recall 82.01%) other three popular methods -- CPC (precision 98.31%, recall 25.95%), CPAT (precision 97.74%, recall 52.50%) and PORTRAIT (precision 84.37%, recall 73.2%) when applied to an independent benchmark dataset. However, the CNCTDiscriminator model that was trained using the ensemble approach shows comparable performance (precision 89.85%, recall 71.08%).


international conference of the ieee engineering in medicine and biology society | 2011

Discovery of lung cancer pathways using Reverse Phase Protein Microarray and prior-knowledge based Bayesian networks

Dong Chul Kim; Chin Rang Yang; Xiaoyu Wang; Baoju Zhang; Xiaorong Wu; Jean Gao

The goal of this paper is to infer the signaling pathway related to lung cancer using Reverse Phase Protein Microarray (RPPM), which provides information on post-translational phosphorylation events. The computational inferring of pathways is obtained by performing Bayesian network in combination with prior knowledge from Protein-Protein Interaction (PPI). A clustering based Linear Programming Relaxation is developed for the searching of optimal networks. The PPI prior knowledge is incorporated into a new scoring function definition based on minimum description length (MDL). In the experiment, we first evaluate the algorithm performance with synthetic networks and associated data. Then we show our signaling network inference for lung cancer using RPPM data. Through the study, we expect to derive new signalling pathways and insight on protein regulatory relationships, which are yet to be known for lung cancer study.


data mining in bioinformatics | 2015

Probabilistic partial least squares regression for quantitative analysis of Raman spectra

Shuo Li; James O. Nyagilo; Digant P. Dave; Wei Wang; Baoju Zhang; Jean Gao

With the latest development of Surface-Enhanced Raman Scattering (SERS) technique, quantitative analysis of Raman spectra has shown the potential and promising trend of development in vivo molecular imaging. Partial Least Squares Regression (PLSR) is state-of-the-art method. But it only relies on training samples, which makes it difficult to incorporate complex domain knowledge. Based on probabilistic Principal Component Analysis (PCA) and probabilistic curve fitting idea, we propose a probabilistic PLSR (PPLSR) model and an Estimation Maximisation (EM) algorithm for estimating parameters. This model explains PLSR from a probabilistic viewpoint, describes its essential meaning and provides a foundation to develop future Bayesian nonparametrics models. Two real Raman spectra datasets were used to evaluate this model, and experimental results show its effectiveness.


bioinformatics and bioengineering | 2014

NMF-Based LncRNA-Disease Association Inference and Bi-Clustering

Ashis Kumer Biswas; Jean Gao; Baoju Zhang; Xiaoyong Wu

Long non-coding RNAs (lncRNAs) have been implicated in various biological processes, and are linked in many dysregulations. Researchers have reported large number of lncRNA associated human diseases over the past decade. In this article we employed the Non-negative Matrix Factorization method to develop a low-dimensional computational model that can describe the existing knowledge about lncRNA-disease associations represented in a two dimensional association matrix. The non-negativity constraints of the matrix and its corresponding factors ensure that each lncRNAs disease profile can be represented as an additive linear combination of the latent coordinates. To learn such a constrained model from an incomplete association matrix, several NMF formulations were developed. Based on our experiments, we found that the Sparse NMF obtained the best model among all the other models. Moreover, by exploiting the inherent bi-clustering ability of the NMF models, we extracted several lncRNA groups and disease groups that possess biological significance.


international conference on machine learning and applications | 2013

eQTL Mapping Study via Regularized Sparse Canonical Correlation Analysis

Mingon Kang; Shuo Li; Dong Chul Kim; Chunyu Liu; Baoju Zhang; Xiaoyong Wu; Jean Gao

While genome-wide association studies (GWAS) have focused on discovering genetic loci mapped to a disease, expression quantitative trait loci (eQTL) studies combine micro array data and provide a powerful approach. Micro arrays allow one to measure thousands of gene expressions simultaneously and the advances in eQTL studies enable one to capture the insight of the genetic architecture of gene expression. A number of multivariate methods have been recently proposed to identify genetic loci which are linked to gene expression taking into account joint effects and relationships between the units rather than the single locus alone independently. However, the previous research has limitations, such as the lack of supporting the cis/tran-eQTL model into being accepted as a general genetics model. We propose a novel regularized eQTL association mapping detection (Reg-AMADE) method. We have focused on the following three problems. First, we need to take into account co-expressed genes without using clustering or partitioning techniques, as well as detecting linkage disequilibrium and the joint effect of multiple genetic markers. Secondly, we need to build a regularized model to support the cis- and trans-eQTL model observed in most association studies. Lastly, we need to discover the significant genes underlying within diseases rather than a common component. We also propose a new simulation experiment method that implements practical situations so that the results can be evaluated in the true sense instead of the assessment with random samples generated from multivariate normal distributions that most research has mainly used. The power to detect both the joint effect and grouping effect of SNPs and gene expressions is assessed in the simulation study.


data mining in bioinformatics | 2013

Eigenspectra, a robust regression method for multiplexed Raman spectra analysis

Shuo Li; James O. Nyagilo; Digant P. Dave; Baoju Zhang; Jean Gao

Raman spectroscopy has been one of the most sensitive techniques widely used in chemical and pharmaceutical research. With the latest development of surface enhanced Raman scattering (SERS) nanoparticles, the application now can be extended to bioimaging and biosensing. In this study, we demonstrate the ability of Raman spectroscopy to separate multiple spectral fingerprints using Raman nanotags after injection. The competence will further be used as functional agents for diagnostic molecular imaging applications. In this paper, a machine learning method is proposed to estimate the mixing ratios of each source signal from a mixture signal. The method first decomposes the training mixture signal matrix into a number of components and meanwhile keeps the maximum linear relationship between the new coordinate and ground truth ratio matrix. Then a regression coefficient matrix is formed by the component matrix. Traditional regression methods provide poor decomposition results due to various factors in sample preparation and machine operation that lead to the stochastic nature of Raman spectrum. The robustness of the proposed method was compared with least square and weighted least square methods.

Collaboration


Dive into the Baoju Zhang's collaboration.

Top Co-Authors

Avatar

Jean Gao

University of Texas at Arlington

View shared research outputs
Top Co-Authors

Avatar

Xiaoyong Wu

Tianjin Normal University

View shared research outputs
Top Co-Authors

Avatar

Ashis Kumer Biswas

University of Texas at Arlington

View shared research outputs
Top Co-Authors

Avatar

Dong Chul Kim

University of Texas at Arlington

View shared research outputs
Top Co-Authors

Avatar

Shuo Li

University of Texas at Arlington

View shared research outputs
Top Co-Authors

Avatar

Chunyu Liu

University of Illinois at Chicago

View shared research outputs
Top Co-Authors

Avatar

Mingon Kang

Kennesaw State University

View shared research outputs
Top Co-Authors

Avatar

Digant P. Dave

University of Texas at Arlington

View shared research outputs
Top Co-Authors

Avatar

James O. Nyagilo

University of Texas at Arlington

View shared research outputs
Top Co-Authors

Avatar

Wei Wang

Tianjin Normal University

View shared research outputs
Researchain Logo
Decentralizing Knowledge