Yijie Ding | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yijie Ding is active.

Explore More

Publication

Featured researches published by Yijie Ding.

International Journal of Molecular Sciences | 2016

Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information

Yijie Ding; Jijun Tang; Fei Guo

Identification of protein–protein interactions (PPIs) is a difficult and important problem in biology. Since experimental methods for predicting PPIs are both expensive and time-consuming, many computational methods have been developed to predict PPIs and interaction networks, which can be used to complement experimental approaches. However, these methods have limitations to overcome. They need a large number of homology proteins or literature to be applied in their method. In this paper, we propose a novel matrix-based protein sequence representation approach to predict PPIs, using an ensemble learning method for classification. We construct the matrix of Amino Acid Contact (AAC), based on the statistical analysis of residue-pairing frequencies in a database of 6323 protein–protein complexes. We first represent the protein sequence as a Substitution Matrix Representation (SMR) matrix. Then, the feature vector is extracted by applying algorithms of Histogram of Oriented Gradient (HOG) and Singular Value Decomposition (SVD) on the SMR matrix. Finally, we feed the feature vector into a Random Forest (RF) for judging interaction pairs and non-interaction pairs. Our method is applied to several PPI datasets to evaluate its performance. On the S.cerevisiae dataset, our method achieves 94.83% accuracy and 92.40% sensitivity. Compared with existing methods, and the accuracy of our method is increased by 0.11 percentage points. On the H.pylori dataset, our method achieves 89.06% accuracy and 88.15% sensitivity, the accuracy of our method is increased by 0.76%. On the Human PPI dataset, our method achieves 97.60% accuracy and 96.37% sensitivity, and the accuracy of our method is increased by 1.30%. In addition, we test our method on a very important PPI network, and it achieves 92.71% accuracy. In the Wnt-related network, the accuracy of our method is increased by 16.67%. The source code and all datasets are available at https://figshare.com/s/580c11dce13e63cb9a53.

Journal of Parallel and Distributed Computing | 2017

Prediction of human protein subcellular localization using deep learning

Leyi Wei; Yijie Ding; Ran Su; Jijun Tang; Quan Zou

Abstract Protein subcellular localization (PSL), as one of the most critical characteristics of human cells, plays an important role for understanding specific functions and biological processes in cells. Accurate prediction of protein subcellular localization is a fundamental and challenging problem, for which machine learning algorithms have been widely used. Traditionally, the performance of PSL prediction highly depends on handcrafted feature descriptors to represent proteins. In recent years, deep learning has emerged as a hot research topic in the field of machine learning, achieving outstanding success in learning high-level latent features within data samples. In this paper, to accurately predict protein subcellular locations, we propose a deep learning based predictor called DeepPSL by using Stacked Auto-Encoder (SAE) networks. In this predictor, we automatically learn high-level and abstract feature representations of proteins by exploring non-linear relations among diverse subcellular locations, addressing the problem of the need of handcrafted feature representations. Experimental results evaluated with three-fold cross validation show that the proposed DeepPSL outperforms traditional machine learning based methods. It is expected that DeepPSL, as the first predictor in the field of PSL prediction, has great potential to be a powerful computational method complementary to existing tools.

BMC Bioinformatics | 2016

Predicting protein-protein interactions via multivariate mutual information of protein sequences

Yijie Ding; Jijun Tang; Fei Guo

BackgroundProtein-protein interactions (PPIs) are central to a lot of biological processes. Many algorithms and methods have been developed to predict PPIs and protein interaction networks. However, the application of most existing methods is limited since they are difficult to compute and rely on a large number of homologous proteins and interaction marks of protein partners. In this paper, we propose a novel sequence-based approach with multivariate mutual information (MMI) of protein feature representation, for predicting PPIs via Random Forest (RF).MethodsOur method constructs a 638-dimentional vector to represent each pair of proteins. First, we cluster twenty standard amino acids into seven function groups and transform protein sequences into encoding sequences. Then, we use a novel multivariate mutual information feature representation scheme, combined with normalized Moreau-Broto Autocorrelation, to extract features from protein sequence information. Finally, we feed the feature vectors into a Random Forest model to distinguish interaction pairs from non-interaction pairs.ResultsTo evaluate the performance of our new method, we conduct several comprehensive tests for predicting PPIs. Experiments show that our method achieves better results than other outstanding methods for sequence-based PPIs prediction. Our method is applied to the S.cerevisiae PPIs dataset, and achieves 95.01 % accuracy and 92.67 % sensitivity repectively. For the H.pylori PPIs dataset, our method achieves 87.59 % accuracy and 86.81 % sensitivity respectively. In addition, we test our method on other three important PPIs networks: the one-core network, the multiple-core network, and the crossover network.ConclusionsCompared to the Conjoint Triad method, accuracies of our method are increased by 6.25,2.06 and 18.75 %, respectively. Our proposed method is a useful tool for future proteomics studies.

Journal of Chemical Information and Modeling | 2015

Identification of Protein-Protein Interactions by Detecting Correlated Mutation at the Interface.

Fei Guo; Yijie Ding; Zhao Li; Jijun Tang

Protein-protein interactions play key roles in a multitude of biological processes, such as de novo drug design, immune response, and enzymatic activity. It is of great interest to understand how proteins in a complex interact with each other. Here, we present a novel method for identifying protein-protein interactions, based on typical co-evolutionary information. Correlated mutation analysis can be used to predict interface residues. In this paper, we propose a non-redundant database to detect correlated mutation at the interface. First, we construct structure alignments for one input protein, based on all aligned proteins in the database. Evolutionary distance matrices, one for each input protein, can be calculated through geometric similarity and evolutionary information. Then, we use evolutionary distance matrices to estimate correlation coefficient between each pair of fragments from two input proteins. Finally, we extract interacting residues with high values of correlation coefficient, which can be grouped as interacting patches. Experiments illustrate that our method achieves better results than some existing co-evolution-based methods. Applied to SK/RR interaction between sensor kinase and response regulator proteins, our method has accuracy and coverage values of 53% and 45%, which improves upon accuracy and coverage values of 50% and 30% for DCA method. We evaluate interface prediction on four protein families, and our method has overall accuracy and coverage values of 34% and 30%, which improves upon overall accuracy and coverage values of 27% and 21% for PIFPAM. Our method has overall accuracy and coverage values of 59% and 63% on Benchmark v4.0, and 50% and 49% on CAPRI targets. Comparing to existing methods, our method improves overall accuracy value by at least 2%.

Information Sciences | 2017

Identification of drug-target interactions via multiple information integration

Yijie Ding; Jijun Tang; Fei Guo

Abstract Identifying Drug-Target Interactions (DTIs) is an important process in drug discovery. Traditional experimental methods are expensive and time-consuming for detecting DTIs. Therefore, computational approaches provide many effective strategies to deal with this issue. In recent years, most of computational methods only use the information of drug-drug similarity or target-target similarity, which cannot perfectly capture all characteristics to identify DTIs. In this paper, we propose a novel computational model of DTIs prediction, based on machine learning methods. To improve the performance of prediction, we further use molecular substructure fingerprints, Multivariate Mutual Information (MMI) of proteins and network topology to represent drugs, targets and relationship between them. Moreover, we employ Support Vector Machine (SVM) and Feature Selection (FS) to construct model for predicting DTIs. Experiments of evaluation show that proposed approach achieves better results than other outstanding methods for feature-based DTIs prediction. The proposed approach achieves AUPRs of 0.899, 0.929, 0.821 and 0.655 on Enzyme, Ion Channel (IC), GPCR and Nuclear Receptor datasets, respectively. Compared with existing best methods, AUPRs are increased by 0.016 on Ion Channel datasets. In addition, our method obtains the second best performance on GPCR and Enzyme datasets. The source code and all datasets are available at https://figshare.com/s/53bf5a6065f3911d46f6 .

PLOS ONE | 2017

Improved detection of DNA-binding proteins via compression technology on PSSM information

Yubo Wang; Yijie Ding; Fei Guo; Leyi Wei; Jijun Tang

Since the importance of DNA-binding proteins in multiple biomolecular functions has been recognized, an increasing number of researchers are attempting to identify DNA-binding proteins. In recent years, the machine learning methods have become more and more compelling in the case of protein sequence data soaring, because of their favorable speed and accuracy. In this paper, we extract three features from the protein sequence, namely NMBAC (Normalized Moreau-Broto Autocorrelation), PSSM-DWT (Position-specific scoring matrix—Discrete Wavelet Transform), and PSSM-DCT (Position-specific scoring matrix—Discrete Cosine Transform). We also employ feature selection algorithm on these feature vectors. Then, these features are fed into the training SVM (support vector machine) model as classifier to predict DNA-binding proteins. Our method applys three datasets, namely PDB1075, PDB594 and PDB186, to evaluate the performance of our approach. The PDB1075 and PDB594 datasets are employed for Jackknife test and the PDB186 dataset is used for the independent test. Our method achieves the best accuracy in the Jacknife test, from 79.20% to 86.23% and 80.5% to 86.20% on PDB1075 and PDB594 datasets, respectively. In the independent test, the accuracy of our method comes to 76.3%. The performance of independent test also shows that our method has a certain ability to be effectively used for DNA-binding protein prediction. The data and source code are at https://doi.org/10.6084/m9.figshare.5104084.

International Journal of Molecular Sciences | 2017

An Ameliorated Prediction of Drug–Target Interactions Based on Multi-Scale Discrete Wavelet Transform and Network Features

Cong Shen; Yijie Ding; Jijun Tang; Xinying Xu; Fei Guo

The prediction of drug–target interactions (DTIs) via computational technology plays a crucial role in reducing the experimental cost. A variety of state-of-the-art methods have been proposed to improve the accuracy of DTI predictions. In this paper, we propose a kind of drug–target interactions predictor adopting multi-scale discrete wavelet transform and network features (named as DAWN) in order to solve the DTIs prediction problem. We encode the drug molecule by a substructure fingerprint with a dictionary of substructure patterns. Simultaneously, we apply the discrete wavelet transform (DWT) to extract features from target sequences. Then, we concatenate and normalize the target, drug, and network features to construct feature vectors. The prediction model is obtained by feeding these feature vectors into the support vector machine (SVM) classifier. Extensive experimental results show that the prediction ability of DAWN has a compatibility among other DTI prediction schemes. The prediction areas under the precision–recall curves (AUPRs) of four datasets are 0.895 (Enzyme), 0.921 (Ion Channel), 0.786 (guanosine-binding protein coupled receptor, GPCR), and 0.603 (Nuclear Receptor), respectively.

Computational Biology and Chemistry | 2016

Protein-protein interface prediction based on hexagon structure similarity

Fei Guo; Yijie Ding; Shuai Cheng Li; Chao Shen; Lusheng Wang

Studies on protein-protein interaction are important in proteome research. How to build more effective models based on sequence information, structure information and physicochemical characteristics, is the key technology in protein-protein interface prediction. In this paper, we study the protein-protein interface prediction problem. We propose a novel method for identifying residues on interfaces from an input protein with both sequence and 3D structure information, based on hexagon structure similarity. Experiments show that our method achieves better results than some state-of-the-art methods for identifying protein-protein interface. Comparing to existing methods, our approach improves F-measure value by at least 0.03. On a common dataset consisting of 41 complexes, our method has overall precision and recall values of 63% and 57%. On Benchmark v4.0, our method has overall precision and recall values of 55% and 56%. On CAPRI targets, our method has overall precision and recall values of 52% and 55%.

Molecules | 2017

Identification of DNA–protein Binding Sites through Multi-Scale Local Average Blocks on Sequence Information

Cong Shen; Yijie Ding; Jijun Tang; Jian Song; Fei Guo

DNA–protein interactions appear as pivotal roles in diverse biological procedures and are paramount for cell metabolism, while identifying them with computational means is a kind of prudent scenario in depleting in vitro and in vivo experimental charging. A variety of state-of-the-art investigations have been elucidated to improve the accuracy of the DNA–protein binding sites prediction. Nevertheless, structure-based approaches are limited under the condition without 3D information, and the predictive validity is still refinable. In this essay, we address a kind of competitive method called Multi-scale Local Average Blocks (MLAB) algorithm to solve this issue. Different from structure-based routes, MLAB exploits a strategy that not only extracts local evolutionary information from primary sequences, but also using predicts solvent accessibility. Moreover, the construction about predictors of DNA–protein binding sites wields an ensemble weighted sparse representation model with random under-sampling. To evaluate the performance of MLAB, we conduct comprehensive experiments of DNA–protein binding sites prediction. MLAB gives MCC of 0.392, 0.315, 0.439 and 0.245 on PDNA-543, PDNA-41, PDNA-316 and PDNA-52 datasets, respectively. It shows that MLAB gains advantages by comparing with other outstanding methods. MCC for our method is increased by at least 0.053, 0.015 and 0.064 on PDNA-543, PDNA-41 and PDNA-316 datasets, respectively.

Neurocomputing | 2018

Identification of drug-side effect association via multiple information integration with centered kernel alignment

Yijie Ding; Jijun Tang; Fei Guo

Abstract In medicine research, drug discovery aims to develop a drug to patients who will benefit from it and try to avoid some side effects. However, the tradition experiment is time consuming and expensive. In recent years, computational approaches provide many effective strategies to deal with this issue. In fact, the known associations between drugs and side-effects are less than unknown associations, thus it can be seen as an imbalance classification problem. Although several classification methods have been developed to predict drug-side effect associations, the performance of predictors could also be further improved. In this paper, we propose a novel predictor of drug-side effect associations. First, we construct multiple kernels from drug space and side-effect space, respectively. Then, these corresponding kernels are linear weighted by optimized Centered Kernel Alignment-based Multiple Kernel Learning (CKA-MKL) algorithm in two different spaces. At last, Kronecker Regularized Least Squares (Kronecker RLS) is employed to fuse drug kernel and side-effect kernel, further identify drug-side effect associations. Compared with many existing methods, our proposed approach achieves better results on three benchmark datasets of drug side-effect associations. The values of Area Under the Precision Recall curve (AUPR) are 0.672, 0.679 and 0.675 on Pauwels’s dataset, Mizutani’s dataset and Liu’s dataset, respectively. The AUPRs are improved by at least 0.012, 0.013 and 0.014 on three different datasets. Experimental results show that our method has outstanding performance among other excellent approaches on identifying drug-side effect associations.

Explore More