bioRxiv | 2019

SKIPHOS: non-kinase specific phosphorylation site prediction with random forests and amino acid skip-gram embeddings

 
 
 
 

Abstract


Phosphorylation, which is catalyzed by kinase proteins, is in the top two most common and widely studied types of known essential post-translation protein modification (PTM). Phosphorylation is known to regulate most cellular processes such as protein synthesis, cell division, signal transduction, cell growth, development and aging. Various phosphorylation site prediction models have been developed, which can be broadly categorized as being kinase-specific or non-kinase specific (general). Unlike the latter, the former requires a large enough number of experimentally known phosphorylation sites annotated with a given kinase for training the model, which is not the case in reality: less than 3% of the phosphorylation sites known to date have been annotated with a responsible kinase. To date, there are a few nonkinase specific phosphorylation site prediction models proposed. This study introduces a non-kinase specific phosphorylation site prediction model based on random forests on top of a continuous distributed representation of amino acids. In the experiments, our method is compared to three recent methods including PhosphoSVM, iPhos-PreEn and RFPhos.

Volume None
Pages None
DOI 10.1101/793794
Language English
Journal bioRxiv

Full Text