Bioinformatics | 2019

copMEM: finding maximal exact matches via sampling both genomes

 
 

Abstract


Motivation: Genome‐to‐genome comparisons require designating anchor points, which are given by Maximum Exact Matches (MEMs) between their sequences. For large genomes this is a challenging problem and the performance of existing solutions, even in parallel regimes, is not quite satisfactory. Results: We present a new algorithm, copMEM, that allows to sparsely sample both input genomes, with sampling steps being coprime. Despite being a single‐threaded implementation, copMEM computes all MEMs of minimum length 100 between the human and mouse genomes in less than 2 minutes, using 7 GB of RAM memory. Availability and implementation: https://github.com/wbieniec/copmem Supplementary data: Supplementary data are available at Bioinformatics online.

Volume 35
Pages 677–678
DOI 10.1093/bioinformatics/bty670
Language English
Journal Bioinformatics

Full Text