Runyu Jing | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Runyu Jing is active.

Explore More

Publication

Featured researches published by Runyu Jing.

Computational Biology and Chemistry | 2012

Predicting deleterious non-synonymous single nucleotide polymorphisms in signal peptides based on hybrid sequence attributes

Wenli Qin; Yizhou Li; Juan Li; Lezheng Yu; Di Wu; Runyu Jing; Xuemei Pu; Yanzhi Guo; Menglong Li

Signal peptides play a crucial role in various biological processes, such as localization of cell surface receptors, translocation of secreted proteins and cell-cell communication. However, the amino acid mutation in signal peptides, also called non-synonymous single nucleotide polymorphisms (nsSNPs or SAPs) may lead to the loss of their functions. In the present study, a computational method was proposed for predicting deleterious nsSNPs in signal peptides based on random forest (RF) by incorporating position specific scoring matrix (PSSM) profile, SignalP score and physicochemical properties. These features were optimized by the maximum relevance minimum redundancy (mRMR) method. Then, a cost matrix was used to minimize the effect of the imbalanced data classification problem that usually occurred in nsSNPs prediction. The method achieved an overall accuracy of 84.5% and the area under the ROC curve (AUC) of 0.822 by Jackknife test, when the optimal subset included 10 features. Furthermore, on the same dataset, we compared our predictor with other existing methods, including R-score-based method and D-score-based methods, and the result of our method was superior to those of the two methods. The satisfactory performance suggests that our method is effective in predicting the deleterious nsSNPs in signal peptides.

Scientific Reports | 2016

A new strategy for exploring the hierarchical structure of cancers by adaptively partitioning functional modules from gene expression network.

Junmei Xu; Runyu Jing; Yuan Liu; Yongcheng Dong; Zhining Wen; Menglong Li

The interactions among the genes within a disease are helpful for better understanding the hierarchical structure of the complex biological system of it. Most of the current methodologies need the information of known interactions between genes or proteins to create the network connections. However, these methods meet the limitations in clinical cancer researches because different cancers not only share the common interactions among the genes but also own their specific interactions distinguished from each other. Moreover, it is still difficult to decide the boundaries of the sub-networks. Therefore, we proposed a strategy to construct a gene network by using the sparse inverse covariance matrix of gene expression data, and divide it into a series of functional modules by an adaptive partition algorithm. The strategy was validated by using the microarray data of three cancers and the RNA-sequencing data of glioblastoma. The different modules in the network exhibited specific functions in cancers progression. Moreover, based on the gene expression profiles in the modules, the risk of death was well predicted in the clustering analysis and the binary classification, indicating that our strategy can be benefit for investigating the cancer mechanisms and promoting the clinical applications of network-based methodologies in cancer researches.

Analytical Methods | 2014

Classification of multi-family enzymes by multi-label machine learning and sequence-based descriptors

Yuelong Wang; Runyu Jing; Yongpan Hua; Yuanyuan Fu; Xu Dai; Liqiu Huang; Menglong Li

Multi-family enzymes are of great importance in life, disease and other domains. However, in terms of the classification of enzymes, the information of multi-family enzymes is always removed from the dataset to account for the limitation of traditional single-label prediction methods. In order to predict multiple classes of multi-family enzymes, we adopted two multi-label learning algorithms, namely RAkEL-RF and MLKNN, and two types of protein descriptors, namely CTD and PseAAC, to generate four predictors, RAkEL-RF-CTD, RAkEL-RF-PseAAC, MLKNN-CTD and MLKNN-PseAAC. When the four predictors were tested on a training set with 10-fold cross validation, the overall success rates reached 97.99%, 96.07%, 96.01% and 95.31%, respectively. For the independent test set, the corresponding rates reached 97.57%, 95.03%, 95.9% and 93.9%, respectively. In conclusion, it proved the outstanding prediction capability and robustness of our predictors from the extremely small difference between two sets for each predictor and the relatively higher accuracy. In addition, three of seven pairs of homologous enzymes with different functions and eighteen of twenty-three distantly related enzymes with a similar family were correctly classified by the RAkEL-RF-CTD predictor. These results indicated the extensive applicability of our predictors.

Computational and Mathematical Methods in Medicine | 2013

The Effect of Edge Definition of Complex Networks on Protein Structure Identification

Jing Sun; Runyu Jing; Di Wu; Tuanfei Zhu; Menglong Li; Yizhou Li

The main objective of this study is to explore the contribution of complex network together with its different definitions of vertexes and edges to describe the structure of proteins. Protein folds into a specific conformation for its function depending on interactions between residues. Consequently, in many studies, a protein structure was treated as a complex system comprised of individual components residues, and edges were interactions between residues. What is the proper time for representing a protein structure as a network? To confirm the effect of different definitions of vertexes and edges in constructing the amino acid interaction networks, protein domains and the structural unit of proteins were described using this method. The identification performance of 2847 proteins with domain/domains proved that the structure of proteins was described well when R Cα was around 5.0–7.5 Å, and the optimal cutoff value for constructing the protein structure networks was 5.0 Å (Cα-Cα distances) while the ideal community division method was community structure detection based on edge betweenness in this study.

Computational Biology and Chemistry | 2013

PPM-Dom

Jing Sun; Runyu Jing; Yuelong Wang; Tuanfei Zhu; Menglong Li; Yizhou Li

Domains are the structural basis of the physiological functions of proteins, and the prediction of which is an advantageous process on the study of protein structure and function. This article proposes a new complete automatic prediction method, PPM-Dom (Domain Position Prediction Method), for predicting the particular positions of domains in a target protein via its atomic coordinate. The presented method integrates complex networks, community division, and fuzzy mean operator (FMO). The whole sequences are divided into potential domain regions by the complex network and community division, and FMO allows the final determination for the domain position. This method will suffice to predict regions that will form a domain structure and those that are unstructured based on completely new atomic coordinate information of the query sequence, and be able to separate different domains in the same query sequence from each other. On evaluating the performance using an independent testing dataset, PPM-Dom reached 91.41% for prediction accuracy, 96.12% for sensitivity and 92.86% for specificity. The tool bag of PPM-Dom is freely available at http://cic.scu.edu.cn/bioinformatics/PPMDom.zip.

RSC Advances | 2016

A facile strategy applied to simultaneous qualitative-detection on multiple components of mixture samples: a joint study of infrared spectroscopy and multi-label algorithms on PBX explosives

Minqi Wang; Xuan He; Qing Xiong; Runyu Jing; Yuxiang Zhang; Zhining Wen; Qifan Kuang; Xuemei Pu; Menglong Li; Tao Xu

We report a facile yet effective strategy of utilizing a combination of Fourier transform-infrared spectroscopy (FTIR) and multi-label algorithms, through which multi-components in polymer bonded explosives (PBXs) could be rapidly and simultaneously identified with high accuracy. The explosive components include 1,3,5,7-tetranitro-1,3,5,7-tetraazacyclo-octane (HMX), hexahydro-1,3,5-trinitro-1,3,5-triazine (RDX), 2,4,6-triamino-1,3,5-trinitrobenzene (TATB) and 2,4,6-trinitrotoluene (TNT) involved in single-component, binary-component and ternary-component PBXs. The train set contains 354 FTIR spectra of the explosives while the independent test set contains 84. Two multi-label strategies (viz., data decomposition and algorithm adaptation) were adopted to construct the classification model with an objective of testing their efficiency in the multi-classification application. Principal component analysis (PCA) was applied to reduce the variables. Both the two algorithms exhibit excellent performance with 100% accuracy for the training and the independent test sets. However, for real PBX samples, the performance of the algorithm adaptation strategy is sharply decreased to 40% accuracy. But, it is noteworthy that the data decomposition strategy still achieves the accuracy of 100% for the real samples, exhibiting stronger robustness for the background interference and high promise in practice. The strategy proposed by the work would provide valuable information for advancing analytical methods in the explosive detection system and the other complicated samples.

Biomarkers in Medicine | 2015

Comparative analysis of oncogenes identified by microarray and RNA-sequencing as biomarkers for clinical prognosis

Yuan Liu; Runyu Jing; Junmei Xu; Keqin Liu; Jiwei Xue; Zhining Wen; Menglong Li

AIMS Although RNA-sequencing has been widely used to identify the differentially expressed genes (DEGs) as biomarkers to guide the therapeutic treatment, it is necessary to investigate the concordance of DEGs identified by microarray and RNA-sequencing for the clinical prognosis. MATERIAL & METHODS By using The Cancer Genome Atlas data sets, we thoroughly investigated the concordance of DEGs identified from microarray and RNA-sequencing data and their molecular functions. RESULTS The DEGs identified by both technologies averaged ~98.6% overlap. The cancer-related gene sets were significantly enriched with the DEGs and consistent between two technologies. CONCLUSIONS The highly consistency of DEGs in their regulation directionality and molecular functions indicated the good reproducibility between microarray and RNA-sequencing in identifying potential oncogenes for clinical prognosis.

Analytical Methods | 2015

Characteristic wavenumbers of Raman spectra reveal the molecular mechanisms of oral leukoplakia and can help to improve the performance of diagnostic models

Liqiu Huang; Runyu Jing; Yongning Yang; Xuemei Pu; Menglong Li; Zhining Wen; Yi Li

The correct diagnosis and the prompt treatment of oral leukoplakia (OLK) can efficiently prevent OLK from undergoing malignant transformation to oral squamous cell carcinoma (OSCC). However, the diagnostic model for distinguishing normal mucosa from low-grade dysplasia, as well as high-grade dysplasia from OSCC was not well established in a previous study. In this study, the characteristic wavenumbers in the Raman spectra were first identified by the variable selection methods. Then, the intensities at these wavenumbers were used to classify the biopsies. As a result, the accuracies achieved by the intensities at the characteristic wavenumbers were 70.5% and 94.0% for the classification of normal vs. low-grade dysplasia and high-grade dysplasia vs. OSCC, respectively, which were greater than those (accuracy of 65.4% and 88.0%, respectively) using all the intensities in the Raman spectra. Our results suggested that constructing a diagnostic model with the intensities at the characteristic wavenumbers can improve the identification of the different lesions of oral mucosa. Moreover, most of the Raman intensities for predicting normal vs. low-grade dysplasia indicated that the transformation from normal mucosa to low-grade dysplasia was associated with the changes in the contents of lipids, while most of the intensities for predicting high-grade dysplasia vs. OSCC indicated that the transformation from high-grade dysplasia to OSCC was associated with changes in the contents of proteins and nucleic acids. Our findings can be helpful for diagnosing the various grades of OLK with dysplasia and understanding the molecular mechanisms of potential malignant transformation of oral leukoplakia.

Journal of Theoretical and Computational Science | 2014

A Research of Predicting the B-factor Base on the Protein Sequence

Runyu Jing; Yuelong Wang; Yiming Wu; Yongpan Hua; Xu Dai; Menglong Li

The B-factor, also called the Debye-Waller factor or the temperature factor, is a descriptor of the flexibility of protein and is commonly used in PDB (Protein Data Bank) format files. A B-factor could be measured from a protein crystal by x-ray scattering, but could not be got from the protein sequence directly. Thus, predicting the B-factor only based on the protein sequence could provide some references for the related researchers. In this study, we attempt to predict the B-factor based on the protein sequence. The information in AAindex and the predicted protein secondary structure, relative accessibility, disorder and energy changes are used to describe the amino acid residues. Four machine learning methods are used for modeling and prediction. The 5-fold cross validation is used to evaluate the modeling performance. As a result, this work provided some new methods for predicting and analyzing the B-factor based on the protein sequence, and we hope that this work could be helpful for the related researches.

Scientific Reports | 2017

Functional annotation of sixty-five type-2 diabetes risk SNPs and its application in risk prediction

Yiming Wu; Runyu Jing; Yongcheng Dong; Qifan Kuang; Yan Li; Ziyan Huang; Wei Gan; Yue Xue; Yizhou Li; Menglong Li

Genome-wide association studies (GWAS) have identified more than sixty single nucleotide polymorphisms (SNPs) associated with increased risk for type 2 diabetes (T2D). However, the identification of causal risk SNPs for T2D pathogenesis was complicated by the factor that each risk SNP is a surrogate for the hundreds of SNPs, most of which reside in non-coding regions. Here we provide a comprehensive annotation of 65 known T2D related SNPs and inspect putative functional SNPs probably causing protein dysfunction, response element disruptions of known transcription factors related to T2D genes and regulatory response element disruption of four histone marks in pancreas and pancreas islet. In new identified risk SNPs, some of them were reported as T2D related SNPs in recent studies. Further, we found that accumulation of modest effects of single sites markedly enhanced the risk prediction based on 1989 T2D samples and 3000 healthy controls. The AROC value increased from 0.58 to 0.62 by only using genotype score when putative risk SNPs were added. Besides, the net reclassification improvement is 10.03% on the addition of new risk SNPs. Taken together, functional annotation could provide a list of prioritized potential risk SNPs for the further estimation on the T2D susceptibility of individuals.

Explore More