Runyu Jing
Sichuan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Runyu Jing.
Computational Biology and Chemistry | 2012
Wenli Qin; Yizhou Li; Juan Li; Lezheng Yu; Di Wu; Runyu Jing; Xuemei Pu; Yanzhi Guo; Menglong Li
Signal peptides play a crucial role in various biological processes, such as localization of cell surface receptors, translocation of secreted proteins and cell-cell communication. However, the amino acid mutation in signal peptides, also called non-synonymous single nucleotide polymorphisms (nsSNPs or SAPs) may lead to the loss of their functions. In the present study, a computational method was proposed for predicting deleterious nsSNPs in signal peptides based on random forest (RF) by incorporating position specific scoring matrix (PSSM) profile, SignalP score and physicochemical properties. These features were optimized by the maximum relevance minimum redundancy (mRMR) method. Then, a cost matrix was used to minimize the effect of the imbalanced data classification problem that usually occurred in nsSNPs prediction. The method achieved an overall accuracy of 84.5% and the area under the ROC curve (AUC) of 0.822 by Jackknife test, when the optimal subset included 10 features. Furthermore, on the same dataset, we compared our predictor with other existing methods, including R-score-based method and D-score-based methods, and the result of our method was superior to those of the two methods. The satisfactory performance suggests that our method is effective in predicting the deleterious nsSNPs in signal peptides.
Scientific Reports | 2016
Junmei Xu; Runyu Jing; Yuan Liu; Yongcheng Dong; Zhining Wen; Menglong Li
The interactions among the genes within a disease are helpful for better understanding the hierarchical structure of the complex biological system of it. Most of the current methodologies need the information of known interactions between genes or proteins to create the network connections. However, these methods meet the limitations in clinical cancer researches because different cancers not only share the common interactions among the genes but also own their specific interactions distinguished from each other. Moreover, it is still difficult to decide the boundaries of the sub-networks. Therefore, we proposed a strategy to construct a gene network by using the sparse inverse covariance matrix of gene expression data, and divide it into a series of functional modules by an adaptive partition algorithm. The strategy was validated by using the microarray data of three cancers and the RNA-sequencing data of glioblastoma. The different modules in the network exhibited specific functions in cancers progression. Moreover, based on the gene expression profiles in the modules, the risk of death was well predicted in the clustering analysis and the binary classification, indicating that our strategy can be benefit for investigating the cancer mechanisms and promoting the clinical applications of network-based methodologies in cancer researches.
Analytical Methods | 2014
Yuelong Wang; Runyu Jing; Yongpan Hua; Yuanyuan Fu; Xu Dai; Liqiu Huang; Menglong Li
Multi-family enzymes are of great importance in life, disease and other domains. However, in terms of the classification of enzymes, the information of multi-family enzymes is always removed from the dataset to account for the limitation of traditional single-label prediction methods. In order to predict multiple classes of multi-family enzymes, we adopted two multi-label learning algorithms, namely RAkEL-RF and MLKNN, and two types of protein descriptors, namely CTD and PseAAC, to generate four predictors, RAkEL-RF-CTD, RAkEL-RF-PseAAC, MLKNN-CTD and MLKNN-PseAAC. When the four predictors were tested on a training set with 10-fold cross validation, the overall success rates reached 97.99%, 96.07%, 96.01% and 95.31%, respectively. For the independent test set, the corresponding rates reached 97.57%, 95.03%, 95.9% and 93.9%, respectively. In conclusion, it proved the outstanding prediction capability and robustness of our predictors from the extremely small difference between two sets for each predictor and the relatively higher accuracy. In addition, three of seven pairs of homologous enzymes with different functions and eighteen of twenty-three distantly related enzymes with a similar family were correctly classified by the RAkEL-RF-CTD predictor. These results indicated the extensive applicability of our predictors.
Computational and Mathematical Methods in Medicine | 2013
Jing Sun; Runyu Jing; Di Wu; Tuanfei Zhu; Menglong Li; Yizhou Li
The main objective of this study is to explore the contribution of complex network together with its different definitions of vertexes and edges to describe the structure of proteins. Protein folds into a specific conformation for its function depending on interactions between residues. Consequently, in many studies, a protein structure was treated as a complex system comprised of individual components residues, and edges were interactions between residues. What is the proper time for representing a protein structure as a network? To confirm the effect of different definitions of vertexes and edges in constructing the amino acid interaction networks, protein domains and the structural unit of proteins were described using this method. The identification performance of 2847 proteins with domain/domains proved that the structure of proteins was described well when R Cα was around 5.0–7.5 Å, and the optimal cutoff value for constructing the protein structure networks was 5.0 Å (Cα-Cα distances) while the ideal community division method was community structure detection based on edge betweenness in this study.
Computational Biology and Chemistry | 2013
Jing Sun; Runyu Jing; Yuelong Wang; Tuanfei Zhu; Menglong Li; Yizhou Li
Domains are the structural basis of the physiological functions of proteins, and the prediction of which is an advantageous process on the study of protein structure and function. This article proposes a new complete automatic prediction method, PPM-Dom (Domain Position Prediction Method), for predicting the particular positions of domains in a target protein via its atomic coordinate. The presented method integrates complex networks, community division, and fuzzy mean operator (FMO). The whole sequences are divided into potential domain regions by the complex network and community division, and FMO allows the final determination for the domain position. This method will suffice to predict regions that will form a domain structure and those that are unstructured based on completely new atomic coordinate information of the query sequence, and be able to separate different domains in the same query sequence from each other. On evaluating the performance using an independent testing dataset, PPM-Dom reached 91.41% for prediction accuracy, 96.12% for sensitivity and 92.86% for specificity. The tool bag of PPM-Dom is freely available at http://cic.scu.edu.cn/bioinformatics/PPMDom.zip.
RSC Advances | 2016
Minqi Wang; Xuan He; Qing Xiong; Runyu Jing; Yuxiang Zhang; Zhining Wen; Qifan Kuang; Xuemei Pu; Menglong Li; Tao Xu
We report a facile yet effective strategy of utilizing a combination of Fourier transform-infrared spectroscopy (FTIR) and multi-label algorithms, through which multi-components in polymer bonded explosives (PBXs) could be rapidly and simultaneously identified with high accuracy. The explosive components include 1,3,5,7-tetranitro-1,3,5,7-tetraazacyclo-octane (HMX), hexahydro-1,3,5-trinitro-1,3,5-triazine (RDX), 2,4,6-triamino-1,3,5-trinitrobenzene (TATB) and 2,4,6-trinitrotoluene (TNT) involved in single-component, binary-component and ternary-component PBXs. The train set contains 354 FTIR spectra of the explosives while the independent test set contains 84. Two multi-label strategies (viz., data decomposition and algorithm adaptation) were adopted to construct the classification model with an objective of testing their efficiency in the multi-classification application. Principal component analysis (PCA) was applied to reduce the variables. Both the two algorithms exhibit excellent performance with 100% accuracy for the training and the independent test sets. However, for real PBX samples, the performance of the algorithm adaptation strategy is sharply decreased to 40% accuracy. But, it is noteworthy that the data decomposition strategy still achieves the accuracy of 100% for the real samples, exhibiting stronger robustness for the background interference and high promise in practice. The strategy proposed by the work would provide valuable information for advancing analytical methods in the explosive detection system and the other complicated samples.
Biomarkers in Medicine | 2015
Yuan Liu; Runyu Jing; Junmei Xu; Keqin Liu; Jiwei Xue; Zhining Wen; Menglong Li
AIMS Although RNA-sequencing has been widely used to identify the differentially expressed genes (DEGs) as biomarkers to guide the therapeutic treatment, it is necessary to investigate the concordance of DEGs identified by microarray and RNA-sequencing for the clinical prognosis. MATERIAL & METHODS By using The Cancer Genome Atlas data sets, we thoroughly investigated the concordance of DEGs identified from microarray and RNA-sequencing data and their molecular functions. RESULTS The DEGs identified by both technologies averaged ~98.6% overlap. The cancer-related gene sets were significantly enriched with the DEGs and consistent between two technologies. CONCLUSIONS The highly consistency of DEGs in their regulation directionality and molecular functions indicated the good reproducibility between microarray and RNA-sequencing in identifying potential oncogenes for clinical prognosis.
Analytical Methods | 2015
Liqiu Huang; Runyu Jing; Yongning Yang; Xuemei Pu; Menglong Li; Zhining Wen; Yi Li
The correct diagnosis and the prompt treatment of oral leukoplakia (OLK) can efficiently prevent OLK from undergoing malignant transformation to oral squamous cell carcinoma (OSCC). However, the diagnostic model for distinguishing normal mucosa from low-grade dysplasia, as well as high-grade dysplasia from OSCC was not well established in a previous study. In this study, the characteristic wavenumbers in the Raman spectra were first identified by the variable selection methods. Then, the intensities at these wavenumbers were used to classify the biopsies. As a result, the accuracies achieved by the intensities at the characteristic wavenumbers were 70.5% and 94.0% for the classification of normal vs. low-grade dysplasia and high-grade dysplasia vs. OSCC, respectively, which were greater than those (accuracy of 65.4% and 88.0%, respectively) using all the intensities in the Raman spectra. Our results suggested that constructing a diagnostic model with the intensities at the characteristic wavenumbers can improve the identification of the different lesions of oral mucosa. Moreover, most of the Raman intensities for predicting normal vs. low-grade dysplasia indicated that the transformation from normal mucosa to low-grade dysplasia was associated with the changes in the contents of lipids, while most of the intensities for predicting high-grade dysplasia vs. OSCC indicated that the transformation from high-grade dysplasia to OSCC was associated with changes in the contents of proteins and nucleic acids. Our findings can be helpful for diagnosing the various grades of OLK with dysplasia and understanding the molecular mechanisms of potential malignant transformation of oral leukoplakia.
Journal of Theoretical and Computational Science | 2014
Runyu Jing; Yuelong Wang; Yiming Wu; Yongpan Hua; Xu Dai; Menglong Li
The B-factor, also called the Debye-Waller factor or the temperature factor, is a descriptor of the flexibility of protein and is commonly used in PDB (Protein Data Bank) format files. A B-factor could be measured from a protein crystal by x-ray scattering, but could not be got from the protein sequence directly. Thus, predicting the B-factor only based on the protein sequence could provide some references for the related researchers. In this study, we attempt to predict the B-factor based on the protein sequence. The information in AAindex and the predicted protein secondary structure, relative accessibility, disorder and energy changes are used to describe the amino acid residues. Four machine learning methods are used for modeling and prediction. The 5-fold cross validation is used to evaluate the modeling performance. As a result, this work provided some new methods for predicting and analyzing the B-factor based on the protein sequence, and we hope that this work could be helpful for the related researches.
Scientific Reports | 2017
Yiming Wu; Runyu Jing; Yongcheng Dong; Qifan Kuang; Yan Li; Ziyan Huang; Wei Gan; Yue Xue; Yizhou Li; Menglong Li
Genome-wide association studies (GWAS) have identified more than sixty single nucleotide polymorphisms (SNPs) associated with increased risk for type 2 diabetes (T2D). However, the identification of causal risk SNPs for T2D pathogenesis was complicated by the factor that each risk SNP is a surrogate for the hundreds of SNPs, most of which reside in non-coding regions. Here we provide a comprehensive annotation of 65 known T2D related SNPs and inspect putative functional SNPs probably causing protein dysfunction, response element disruptions of known transcription factors related to T2D genes and regulatory response element disruption of four histone marks in pancreas and pancreas islet. In new identified risk SNPs, some of them were reported as T2D related SNPs in recent studies. Further, we found that accumulation of modest effects of single sites markedly enhanced the risk prediction based on 1989 T2D samples and 3000 healthy controls. The AROC value increased from 0.58 to 0.62 by only using genotype score when putative risk SNPs were added. Besides, the net reclassification improvement is 10.03% on the addition of new risk SNPs. Taken together, functional annotation could provide a list of prioritized potential risk SNPs for the further estimation on the T2D susceptibility of individuals.