J. Intell. Fuzzy Syst. | 2021

iRspot-DCC: Recombination hot/ cold spots identification based on dinucleotide-based correlation coefficient and convolutional neural network

 
 
 
 

Abstract


The correct identification of gene recombination cold/hot spots is of great significance for studying meiotic recombination and genetic evolution. However, most of the existing recombination spots recognition methods ignore the global sequence information hidden in the DNA sequence, resulting in their low recognition accuracy. A computational predictor called iRSpot-DCC was proposed in this paper to improve the accuracy of cold/hot spots identification. In this approach, we propose a feature extraction method based on dinucleotide correlation coefficients that focus more on extracting potential DNA global sequence information. Then, 234 representative features vectors are filtered by SVM weight calculation. Finally, a convolutional neural network with better performance than SVM is selected as a classifier. The experimental results of 5-fold cross-validation test on two standard benchmark datasets showed that the prediction accuracy of our recognition method reached 95.11%, and the Mathew correlation coefficient (MCC) reaches 90.04%, outperforming most other methods. Therefore, iRspot-DCC is a high-precision cold/hot spots identification method for gene recombination, which effectively extracts potential global sequence information from DNA sequences.

Volume 41
Pages 1309-1317
DOI 10.3233/JIFS-210213
Language English
Journal J. Intell. Fuzzy Syst.

Full Text