Ying-Lian Gao
Qufu Normal University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ying-Lian Gao.
IEEE Transactions on Nanobioscience | 2014
Jin-Xing Liu; Ying-Lian Gao; Yong Xu; Chun-Hou Zheng; Jane You
With the development of deep sequencing, vast amounts of RNA-Seq data have been generated. It is crucial how to extract and interpret the meaningful information contained in deep sequencing data. In this paper, based on penalized matrix decomposition (PMD), a novel method, named PMDSeq, was proposed to analyze RNA-seq count data. Firstly, to obtain the differential expression matrix, the matrix of RNA-seq count data was normalized. Secondly, the differential expression matrix was decomposed into three factor matrices. By imposing appropriate constraint on factor matrices, the PMDSeq method can highlight the differentially expressed genes. Thirdly, the proposed method can identify the differentially expressed genes based on the scaled eigensamples. Finally, we used gene ontology tools to check these differentially expressed genes. The experimental results on simulation and three real RNA-seq count data sets demonstrated the effectiveness of our method.
PLOS ONE | 2015
Jian Liu; Jin-Xing Liu; Ying-Lian Gao; Xiang-Zhen Kong; Xue-Song Wang; Dong Wang
In current molecular biology, it becomes more and more important to identify differentially expressed genes closely correlated with a key biological process from gene expression data. In this paper, based on the Schatten p-norm and Lp-norm, a novel p-norm robust feature extraction method is proposed to identify the differentially expressed genes. In our method, the Schatten p-norm is used as the regularization function to obtain a low-rank matrix and the Lp-norm is taken as the error function to improve the robustness to outliers in the gene expression data. The results on simulation data show that our method can obtain higher identification accuracies than the competitive methods. Numerous experiments on real gene expression data sets demonstrate that our method can identify more differentially expressed genes than the others. Moreover, we confirmed that the identified genes are closely correlated with the corresponding gene expression data.
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2016
Jin-Xing Liu; Yong Xu; Ying-Lian Gao; Chun-Hou Zheng; Dong Wang; Qi Zhu
With the development of deep sequencing technologies, many RNA-Seq data have been generated. Researchers have proposed many methods based on the sparse theory to identify the differentially expressed genes from these data. In order to improve the performance of sparse principal component analysis, in this paper, we propose a novel class-information-based sparse component analysis (CISCA) method which introduces the class information via a total scatter matrix. First, CISCA normalizes the RNA-Seq data by using a Poisson model to obtain their differential sections. Second, the total scatter matrix is gotten by combining the between-class and within-class scatter matrices. Third, we decompose the total scatter matrix by using singular value decomposition and construct a new data matrix by using singular values and left singular vectors. Then, aiming at obtaining sparse components, CISCA decomposes the constructed data matrix by solving an optimization problem with sparse constraints on loading vectors. Finally, the differentially expressed genes are identified by using the sparse loading vectors. The results on simulation and real RNA-Seq data demonstrate that our method is effective and suitable for analyzing these data.
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2018
Jin-Xing Liu; Dong Wang; Ying-Lian Gao; Chun-Hou Zheng; Yong Xu; Jiguo Yu
Non-negative Matrix Factorization (NMF), a classical method for dimensionality reduction, has been applied in many fields. It is based on the idea that negative numbers are physically meaningless in various data-processing tasks. Apart from its contribution to conventional data analysis, the recent overwhelming interest in NMF is due to its newly discovered ability to solve challenging data mining and machine learning problems, especially in relation to gene expression data. This survey paper mainly focuses on research examining the application of NMF to identify differentially expressed genes and to cluster samples, and the main NMF models, properties, principles, and algorithms with its various generalizations, extensions, and modifications are summarized. The experimental results demonstrate the performance of the various NMF algorithms in identifying differentially expressed genes and clustering samples.
Neurocomputing | 2017
Jin-Xing Liu; Dong Wang; Ying-Lian Gao; Chun-Hou Zheng; Junliang Shang; Feng Liu; Yong Xu
It is of urgency to effectively identify differentially expressed genes from RNA-Seq data. In this paper, we proposed a novel method, joint-L2,1-norm-constraint-based semi-supervised feature extraction (L21SFE), to analyze RNA-Seq data. Our scheme was shown as follows. Firstly, we constructed a graph Laplacian matrix and refined it by using the labeled samples. Our graph construction method can make full use of a large number of unlabelled samples. Secondly, we found semi-supervised optimal maps by solving a generalized eigenvalue problem. Thirdly, we solved an optimal problem via the joint L2,1-norm constraint to obtain a projection matrix. It can diminish the impact of noises and outliers by using the L2,1-norm constraint and produce more precise results. Finally, we identified differentially expressed genes based on the projection matrix. The results on simulation and real RNA-Seq data sets demonstrated the feasibility and effectiveness of our method.
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2016
Dong Wang; Jin-Xing Liu; Ying-Lian Gao; Chun-Hou Zheng; Yong Xu
Many methods have been considered for gene selection and analysis of gene expression data. Nonetheless, there still exists the considerable space for improving the explicitness and reliability of gene selection. To this end, this paper proposes a novel method named robust graph regularized non-negative matrix factorization for characteristic gene selection using gene expression data, which mainly contains two aspects: Firstly, enforcing L21-norm minimization on error function which is robust to outliers and noises in data points. Secondly, it considers that the samples lie in low-dimensional manifold which embeds in a high-dimensional ambient space, and reveals the data geometric structure embedded in the original data. To demonstrate the validity of the proposed method, we apply it to gene expression data sets involving various human normal and tumor tissue samples and the results demonstrate that the method is effective and feasible.
PLOS ONE | 2016
Dong Wang; Jin-Xing Liu; Ying-Lian Gao; Jiguo Yu; Chun-Hou Zheng; Yong Xu
Recent research has demonstrated that characteristic gene selection based on gene expression data remains faced with considerable challenges. This is primarily because gene expression data are typically high dimensional, negative, non-sparse and noisy. However, existing methods for data analysis are able to cope with only some of these challenges. In this paper, we address all of these challenges with a unified method: nonnegative matrix factorization via the L2,1-norm (NMF-L2,1). While L2,1-norm minimization is applied to both the error function and the regularization term, our method is robust to outliers and noise in the data and generates sparse results. The application of our method to plant and tumor gene expression data demonstrates that NMF-L2,1 can extract more characteristic genes than other existing state-of-the-art methods.
IEEE Transactions on Nanobioscience | 2016
Jin-Xing Liu; Ying-Lian Gao; Chun-Hou Zheng; Yong Xu; Jiguo Yu
The Cancer Genome Atlas (TCGA) dataset provides us more opportunities to systematically and comprehensively learn some biological mechanism of cancers formation, growth and metastasis. Since TCGA dataset includes heterogeneous data, it is one of the bioinformatics bottlenecks to mine some meaningful information from them. In this paper, to improve the performance of Robust Principal Component Analysis (RPCA) analyzing these heterogeneous data, a modified RPCA-based method, Block-Constraint Robust Principal Component Analysis (BCRPCA), is proposed. Since different categories data have different peculiarities, BCRPCA enforces different constraint intensities on different categories to improve the performance of RPCA. Firstly, the observation matrix of TCGA data is decomposed into two adding matrices A and S by using BCRPCA. Secondly, we use a ranking scheme to evaluate every feature and project these features to the genes. Then, the genes with high scores will be identified as differentially expressed ones. The main contributions of this paper are as following: firstly, it proposes, for the first time, the idea and method of BCRPCA to model TCGA data; secondly, it provides a BCRPCA-based framework for integrated analysis of TCGA data. The results show that our method is effective and suitable to analyze these data.
Computational Biology and Chemistry | 2016
Ya-Xuan Wang; Jin-Xing Liu; Ying-Lian Gao; Chun-Hou Zheng; Jun-Liang Shang
With the rapid development of DNA microarray technology and next-generation technology, a large number of genomic data were generated. So how to extract more differentially expressed genes from genomic data has become a matter of urgency. Because Low-Rank Representation (LRR) has the high performance in studying low-dimensional subspace structures, it has attracted a chunk of attention in recent years. However, it does not take into consideration the intrinsic geometric structures in data. In this paper, a new method named Laplacian regularized Low-Rank Representation (LLRR) has been proposed and applied on genomic data, which introduces graph regularization into LRR. By taking full advantages of the graph regularization, LLRR method can capture the intrinsic non-linear geometric information among the data. The LLRR method can decomposes the observation matrix of genomic data into a low rank matrix and a sparse matrix through solving an optimization problem. Because the significant genes can be considered as sparse signals, the differentially expressed genes are viewed as the sparse perturbation signals. Therefore, the differentially expressed genes can be selected according to the sparse matrix. Finally, we use the GO tool to analyze the selected genes and compare the P-values with other methods. The results on the simulation data and two real genomic data illustrate that this method outperforms some other methods: in differentially expressed gene selection.
international conference on intelligent computing | 2015
Dong Wang; Ying-Lian Gao; Jin-Xing Liu; Jiguo Yu; Chang-Gang Wen
Nonnegative matrix factorization (NMF) has become a popular method and widely used in many fields, for the reason that NMF algorithm can deal with many high dimension, non-negative problems. However, in real gene expression data applications, we often have to deal with the geometric structure problems. Thus a Graph Regularized version of NMF is needed. In this paper, we propose a Graph Regularized Non-negative Matrix Factorization (GRNMF) with emphasizing graph regularized on error function to extract characteristic gene set. This method considers the samples in low-dimensional manifold which embedded in a high-dimensional ambient space, and reveals the data geometric structure embedded in the original data. Experiment results on tumor datasets and plants gene expression data demonstrate that our GRNMF model can extract more differential genes than other existing state-of-the-art methods.