IEEE/ACM transactions on computational biology and bioinformatics | 2019
CrystalM: a multi-view fusion approach for protein crystallization prediction.
Abstract
Improving the accuracy of predicting protein crystallization is very important for protein crystallization projects, which is a critical step for the determination of protein structure by X-ray crystallography. Here, we use a novel feature combination to construct a SVM model in the prediction of protein crystallization, called as CrystalM. In this work, we extract six features to represent protein sequences, namely Average Block-Position specific scoring matrix (AVBlock-PSSM), Average Block-Secondary Structure (AVBlock-SS), Global Encoding (GE), Pseudo-Position specific scoring matrix (PsePSSM), Protscale and Discrete Wavelet Transform-Position specific scoring matrix (DWT-PSSM). Moreover, we employ two training datasets (TRAIN3587 and TRAIN1500) and their corresponding independent test datasets (TEST3585 and TEST500) to evaluate CrystalM by feeding multi-view features into Support Vector Machine (SVM) classifier. Two training datasets are employed for five-fold cross validation, and two test datasets are separately used to test the corresponding datasets. Finally, we compare CrystalM with other existing methods in the performance. For TRAIN3587 and TEST3585, CrystalM achieves best ACC, SP and MCC as the previous outperforming methods in the five-fold cross validation. The good performance on the datasets proves the effectiveness of CrystalM and the better performance on large datasets further demonstrates the stability and superiority of CrystalM.