IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2021

Consensus of All Solutions for Intractable Phylogenetic Tree Inference

 
 
 
 
 

Abstract


Solving median tree problems is a classic approach for inferring species trees from a collection of discordant gene trees. Median tree problems are typically NP-hard and dealt with by local search heuristics. Unfortunately, such heuristics generally lack provable correctness and precision. Algorithmic advances addressing this uncertainty have led to exact dynamic programming formulations suitable to solve a well-studied group of median tree problems for smaller phylogenetic analyses. However, these formulations allow computing only very few optimal species trees out of possibly many such trees, and phylogenetic studies often require the analysis of all optimal solutions through their consensus tree. Here, we describe a significant algorithmic modification of the dynamic programming formulations that compute the cluster counts of all optimal species trees from which various types of consensus trees can be efficiently computed. Through experimental studies, we demonstrate that our parallel implementation of the modified dynamic programming formulation is more efficient than a previous implementation of the original formulation. Finally, we show that the parallel implementation can rapidly identify novel reassorted influenza A viruses potentially facilitating pandemic preparedness efforts.

Volume 18
Pages 149-161
DOI 10.1109/TCBB.2019.2947051
Language English
Journal IEEE/ACM Transactions on Computational Biology and Bioinformatics

Full Text