WiSPA: A new approach for dealing with widespread parasitism
VVersion dated: April 9, 2018WiSPA: A new approach for dealing with widespread parasitism
WiSPA: A new approach for dealing with widespreadparasitism
Benjamin Drinkwater , Angela Qiao , and Michael A. Charleston School of Information Technologies, University of Sydney, NSW, 2006, Australia; School of Physical Sciences, University Of Tasmania, TAS, 7005, Australia;
Corresponding author:
Benjamin Drinkwater, School of Information Technologies (J12),University of Sydney, NSW, 2006, Australia; E-mail: [email protected]
Abstract. — Traditionally, studies of coevolving systems have considered cases where aparasite may inhabit only a single host. The case where a parasite may infect many hosts, widespread parasitism , has until recently gained little traction. This is due in part to thecomputational complexity involved in reconstructing the coevolutionary histories whereparasites may infect only a single host, which is NP-Hard. Allowing parasites to inhabitmore than one host has been seen to only further compound this computationallyintractable problem. Recently however, well-established algorithms for estimating theproblem instance where a parasite may infect only a single host have been extended tohandle widespread parasites. Although this has offered significant progress, it has beennoted that these algorithms poorly handle parasites that inhabit phylogenetically distanthosts. a r X i v : . [ q - b i o . P E ] M a r n this work we extend these previous algorithms to handle cases where parasitesinhabit phylogenetically distant hosts using an additional evolutionary event which we call spread . Our new framework is shown to infer significantly more congruent coevolutionaryhistories compared to existing methods over both synthetic and biological data sets. Wethen apply the newly proposed algorithm, which we call WiSPA (WideSpread ParasitismAnalyser), to the well studied coevolutionary system of Primates and
Enterobius (pinworms), where existing methods have been unable to reconcile the widespreadparasitism present without permitting additional divergence events. Using WiSPA and thenew biological event, spread, we provide the first statistically significant coevolutionaryhypothesis for this system.(Keywords: Coevolution, Phylogeny, Widespread Parasitism, NP-Hard )Coevolutionary research has long focused on the area of parasitism due to thehealth risks which parasites pose to the human population (Charleston and Perkins 2006).Parasites and the associations they form with their hosts have been responsible for anumber of the worst emerging diseases impacting global health today, including
Ebola (Peterson et al. 2004), HIV (Siddall 1997), and malaria (Mu et al. 2005). Further researchinto the field of coevolution aims to uncover the deep coevolutionary associations formedby parasitic behaviour, to provide further insights into these deadly diseases (Charlestonand Galvani 2006).We often define coevolutionary systems in terms of an independent phylogeny and acorresponding dependent phylogeny which have formed a macro-scale coevolutionary bond.ne approach that is often applied to evaluating such evolutionary relationships is the fieldof cophylogenetics , which provides a framework to evaluate whether evolutionary historieshave coevolved or have evolved independently (Charleston 2003).As a result of host–parasite systems’ long association with the field ofcophylogenetics, coevolutionary systems often describe the independent and dependentphylogenies as the host ( H ) and parasite ( P ) respectively. Cophylogenetic analysis,however, can be applied to all forms of coevolutionary dependence including: biogeography(Toit et al. 2013), host–pathogen systems (Mu et al. 2005), genes and the species thathouse them (Page and Charleston 1997), plant–insect interactions (G´omez-Acevedo et al.2010), plant–fungi dynamics (Refr´egier et al. 2008), host–parasitoid relationships (Stireman2005), and Batesian and M¨ullerian mimicry between species (Ceccarelli and Crozier 2007;Cuthill and Charleston 2012).The coevolutionary interactions between P and H are represented by theassociations ( ϕ ) between their leaves, based on evidence of parasites inhabiting or infectingtheir host(s). These associations can be used to infer the level of host specificity of theparasite species with respect to its host(s) (Poulin 2011). Within this context, high hostspecificity is the case where a particular parasite infects a single host species, while lowhost specificity is the case where a parasite may infect many host species.Coevolutionary analysis of systems with high host specificity focuses on thereconstruction of the parasite’s evolutionary history with respect to the host, which isknown as a cophylogeny mapping . When recovering a map (Φ) using cophylogeny mapping,the aim is to recover the most congruent solution with a minimum total event cost, whileensuring the associations are conserved using the four known coevolutionary events, codivergence , duplication , host switch and loss (Ronquist 1995).A codivergence event is a concurrent divergence of both the host and parasitelineages. A high concentration of codivergence events leads to an increase in the level ofongruence between P and H , and is therefore a strong indicator of coevolution (Page2002). A duplication event is an independent divergence of the parasite where both newlineages continue to track the host (Tuller et al. 2010). A host switch event is anindependent divergence of the parasite where one parasite shifts from the initial hostlineage (take–off edge) to a new lineage (landing edge) in the host, while the secondparasite continues to track the host (Kim et al. 1985). We call these three events divergence events , as they consider all cases of divergence in the parasite’s coevolutionaryhistory. By contrast, loss arises from three indistinguishable processes: lineage sorting (or“missing the boat”), extinction, or sampling failure. As these processes all produce thesame effect we represent these as a loss event (Paterson et al. 2003). We refer to theproblem of reconstructing a map using only these four events as the restricted cophylogenyreconstruction problem .Methods for recovering maps have mainly focused on the restricted cophylogenyreconstruction problem. This is due in part to the initial set of biological events beingunable to reconstruct the evolutionary history of parasites with low host specificity, alongwith the hypothesis that coevolution only occurs in systems with a one-to-one associationbetween parasites and their hosts (Poulin 2011). This hypothesis, however, only considers aselect set of coevolving systems and precludes many observed coevolutionary systemswhere the parasites maintain low host specificity as an evolutionary advantage. In acomprehensive study of plant–insect interactions, Nosil and Mooers (2005) demonstratedthat while insects often form exclusive associations with their hosts, this is not always thecase. Butterflies and bark beetles were shown to often be associated with many host plantspecies. This case is not unusual, with Stireman’s (2005) study of endoparasitoids andtheir Tachinidae (fly) hosts also demonstrating the evolutionary advantage of low hostspecificity. These results affirm that ongoing cophylogeny mapping modelling mustconsider the general case where parasites are permitted to inhabit more than one host widespread parasitism ), to accurately model all coevolutionary interrelationships.As described above, modelling widespread parasitism using cophylogeny mappingrequires additional biological events beyond the original four events derived by Ronquist(1995). Currently, failure-to-diverge is the only event which has successfully been applied tohandle widespread parasitic events within a cophylogeny mapping framework. It is definedas the case where parasites maintain their ability to inhabit both hosts following a hostdivergence event, without the need for a divergence of the parasite lineage (Johnson et al.2003). Failure-to-diverge is the case where there is an interruption of the gene-flow betweenthe host species while there remains gene flow within the parasite population (Poulin2011). A case where failure-to-diverge and the full set of divergence events are required toreconstruct a cophylogenetic history can be seen in Figure 1. We will here refer to eventswhich are used to describe widespread parasite coevolution, such as failure-to-diverge, as widespread events . This is to differentiate such events from divergence and loss events.The failure-to-diverge event allows for the recovery of solutions for all conceivablecases of widespread parasitism for cophylogenetic reconstructions. However, these solutionsmay have a high number of loss events when widespread parasites inhabit phylogeneticallydistant leaves in H . This is due to the limitation that a failure-to-diverge event occurs atthe most recent common ancestor of the pair of inhabited host leaves (Banks and Paterson2005), after which many loss events must be inferred to account for the observed parasitedistribution.Cophylogenetic reconstructions are evaluated using an event cost, similar to that ofa parsimony score in phylogenetic reconstructions (Charleston 2002). Reconstructing theminimum cost map requires that each divergence event, widespread event, and loss eventbe assigned a penalty cost. The set of costs for each event may be defined as a vector V = ( C, D, W, L, F ) where C , D , W , L , F represent the associative costs for eachcodivergence, duplication, host switch, loss, and failure-to-diverge respectively. Theesultant map cost E can then be derived as: E = αC + βD + γW + δL + (cid:15)F (1)where α , β , γ , δ , (cid:15) represent the number of events for codivergence, duplication, hostswitch, loss, and failure-to-diverge respectively (Drinkwater and Charleston 2014a).Cophylogeny mapping algorithms aim to map P into H , where the number ofcodivergence events is maximised and the map cost E is minimised (Charleston andLibeskind-Hadas 2014). Although such cost schemes may not evaluate coevolutionaryscenarios exactly, particularly modelling preferential host switching (Charleston andRobertson 2002), this technique has been used to evaluate a large number ofcoevolutionary systems (Page 2002; Page et al. 2004; Jackson and Charleston 2004; Cruaudet al. 2012; Rivera-Parra et al. 2015).Parsimony and event-based methodologies are often seen as less preferable tomaximum likelihood methodologies. This is due to parsimony methods often relying onarbitrarily chosen cost schemes. While such a reliance is a limiting factor, parsimony andevent based methods may be used to reconstruct the most likely evolutionary history byassigning the negative log likelihood probabilities for each evolutionary event as theassociated penalty cost for each event (Drinkwater and Charleston 2016). As a result thereis a strong driver for fast mapping methods which can then be integrated into acoevolutionary likelihood framework (Charleston 2003), to complement maximumlikelihood techniques (Baudet et al. 2015).Unfortunately recovering the minimum cost map is known to be NP-Hard (Ovadiaet al. 2011). This computational intractability is due to the exponential number of hostswitch locations that can arise due to the variable order of the internal nodes in the hosttree (Doyon et al. 2010), and the exponential number of internal node orderings (Conowt al. 2010).To mitigate this computational intractability two unique heuristics have beenproposed. The first approach ignores the relative ordering of the internal nodes in theparasite phylogeny, This may lead to the order of evolutionary events, as defined by areconciled map, contradicting the order of evolutionary events as defined by the parasitephylogeny (Doyon et al. 2010). Such a map is often referred to as biologically infeasible ortime-inconsistent (Doyon et al. 2011). This approach has been applied in various methods(Merkle and Middendorf 2005; Merkle et al. 2010; Yodpinyanee et al. 2011), with thefastest known algorithm to date running in O ( n ) (Bansal et al. 2012).To guarantee that solutions are time-consistent two properties must be ensured. Ahost switch’s take–off and landing edges must lie in the same time interval, which is anoverlapping interval based on the edges’ distances from the root of H , and must maintainthe partial ordering of P (Conow et al. 2010), as mentioned above. This requires that therelative order of the parasite phylogeny must be fixed, which has led to the second heuristicwhich fixes the internal node ordering of the host phylogeny. If the internal node orderingis fixed it is possible to solve the cophylogeny reconstruction problem in polynomial time.This simplified problem, often referred to as the dated tree reconciliation problem(Drinkwater and Charleston 2015a), has been applied within a number of algorithms(Doyon et al. 2011; Conow et al. 2010; Drinkwater and Charleston 2015b), with the fastestproposed to date running in O ( n log n ) (Bansal et al. 2012).While the aim of the cophylogeny reconstruction problem is to recover the minimumcost map in terms of, E , it is often valuable to infer the significance of the resultant map.In particular it is valuable to identify if a resultant map provides a statistically significantsignal that the apparent congruence is unlikely to have occurred simply by chance.Previously, analysis has applied a series of Bernoulli trials to analyse the significance of aninferred map, such as the analysis of pocket–gophers and their parasitic chewing lice byage in 1994a. This is also the premise of the statistical evaluation tool Parafit (Legendreet al. 2002) which randomises the associations or the parasite tree to identify whether thedegree of congruence noted between the host and parasite tree could have occurred simplyby chance. This process can be replicated when using cophylogeny mapping, by producingrandomised permutations of the initial tanglegram (either randomising the associations orthe parasite tree), and computing the cost of the optimal map for this randomised instance(Page 1994a,b). If the initial tanglegram has a mapping cost, E , which is less than therandomised permutations in at least 95% of cases, then we may reject the null hypothesisthat there is no significance between the independent and dependent phylogenetic trees.This feature is currently integrated into the most recent implementation of the Janesoftware tool (Conow et al. 2010).Three recent software tools which have been designed to recover maps wherewidespread parasites are considered and using the failure-to-diverge widespreadevolutionary event, are CoRe-PA (Merkle et al. 2010), Jane (Conow et al. 2010) andCoRe-ILP (Wieseke et al. 2015). CoRe-PA solves the cophylogeny reconstruction problemin polynomial time by relaxing the internal node ordering of the host tree. This approach,although potentially recovering solutions in quadratic time (Yodpinyanee et al. 2011), mayrecover solutions which are time-inconsistent (Doyon et al. 2011). Jane, in contrast, fixesthe internal node order in the host tree and solves this instance using dynamicprogramming (Libeskind-Hadas and Charleston 2009). This approach guarantees thatsolutions are biologically feasible. As there are an exponential number of possible fixednode orderings, most techniques applying this approach leverage a genetic algorithm torecover the best possible solutions in a fixed period of time. Finally, CoRe-ILP applies aninteger linear programming algorithm to solve the problem of maximising the total numberof codivergence events within the reconciled map, which its developers have shown providesa robust estimation of the harder problem of finding the minimum cost map (Wieseke et al.015). While each approach handles the computational intractability of recovering thedivergence events in significantly different ways, all apply a common approach forrecovering widespread events, where each failure-to-diverge event occurs at the most recentcommon ancestor of the parasite’s host, guaranteeing solutions can be recovered in allcases. This approach while offering researchers the first set of tools for inferringcoevolutionary systems which include widespread parasites, often infers maps with a highnumber of loss events polluting the coevolutionary signal.Our work aims to expand on this research by constructing a new methodology whichsolves the cophylogeny reconstruction problem with widespread parasites, the widespreadparasite problem , which is able to overcome the high costs that are often associated withfailure-to-diverge. As a result, this method will decrease the overall parsimony score, whilepotentially increasing the number of codivergence events within the reconciled map. Methodology
Reintroducing an additional biological event for coevolutionary analysis
This work reintroduces an additional evolutionary event for inferring relationshipsof coevolutionary systems where widespread parasites are permitted, which we call spread .We propose that existing frameworks be updated to include spread as an additionalwidespread parasite event, to work in conjunction with failure-to-diverge. The inclusion ofspread aims to more accurately reconcile the widespread parasites’ coevolutionary historieswith respect to their hosts, by mitigating the high number of loss events which areassociated with reconstructions that exclusively use failure-to-diverge, such as cases whereparasites do not inhabit closely related hosts.he spread event was first applied by Brooks, (1991), to reconcile widespreadparasites within the Brooks Parsimony Analysis framework, and was later proposed bySiddall and Perkins, (2003), as an additional widespread evolutionary event to beintegrated within TreeMap (Charleston 2012). In both cases, however, neither of theseproposed models have been implemented in part due to the additional computationalcomplexity that their inclusion can give rise to.The spread event is derived from a number of observed parasitic systems, such asthe behaviour of chewing lice which infect their penguins hosts ( ? ). It has been observedthat a number of lice species switch between their penguin hosts at shared breedinggrounds, however, this cannot be modelled using a host switch, as there is no divergence inthe parasite lineage, nor can this be considered a failure-to-diverge event as there is noevidience to support that the gene flow has been maintained for the chewing lice species.Rather, the lice species have recently spread to new hosts based on new opportunities thatare presented.This “spreading” behaviour has also been observed in lab experiments betweennematodes and their Drosophila fly hosts (Jaenike and Dombeck 1998). Nematodes’general purpose genotypes allow each individual species to infect a high number of hostspecies, allowing nematodes to infect distantly related
Drosophila hosts which they wouldnot be expected to encounter in nature. The infection of
Drosophila does not require anyevolutionary changes so this cannot be modelled correctly using a host switch event, norcan it be described using failure-to-diverge as the nematodes have not coexisted with theirnew hosts, and therefore this phenomenon requires an additional biological event to modelthis observed behaviour, which spread successfully achieves.The spread event also complements the theory of low host specificity which assertsspecies evolve specific mechanisms which allow them to inhabit multiple hosts where theparasites are less vulnerable to the evolutionary changes of a specific host species. Further,pread often provides more parsimonious solutions for widespread parasitism. Consider thecoevolutionary system in Figure 2 (left). In the first reconstruction, Figure 2 (center), theparasite has had O ( n ) opportunities to infect a new host species but failed to do so in allcases. This is highly unlikely compared to the alternate reconstruction using spread, whereno loss events occur and the parasite simply infects a new host species based on newopportunities, such as the introduction of an infected host species ( A ) into host species ( B )natural environment.The alternate map which uses spread, Figure 2 (right), is significantly moreparsimonious for cases where the spread event is assigned a penalty cost similar to that offailure-to-diverge. In fact, spread would need to be assigned a cost n times that of a lossevent for the solution to be considered more expensive. Therefore, as this biological eventdescribes observed behaviour in nature and also allows for potentially more parsimoniousmaps, we argue that spread should be integrated into existing algorithms which aim toinfer systems which present widespread parasites. This is in line with previous assertionsmade by Brooks, (1991), Page, (1994a), and Siddall and Perkins, (2003).While often producing significantly cheaper solutions, spread may not always bepossible as it is reliant on host species collocation to permit the occurrence of a spreadevent similar to host switch events (Clayton et al. 2004). Further research needs to beundertaken on how to model this and complements the existing field of research intopreferential host switching (Charleston and Robertson 2002; Cuthill and Charleston 2013).We, however, do not consider this constraint herein, and assume spread is permissiblebetween all hosts, as a means to present this evolutionary event’s value to widespreadparasitism analysis.Formally we define the spread event as a parasite lineage that due to newopportunities infects a new host lineage while maintaining its infection of its current hostlineage. This event therefore consists of a shift of a subset of the parasite lineage from thenitial host (the take-off edge) to a new host (the landing edge), occurring at some pointafter the host lineages have diverged.This definition is derived from the existing definition of the host switch event withwhich spread shares a number of common traits. Both events require the internal nodeordering of the host phylogeny to be fixed, to ensure that the resultant map is timeconsistent, and both events require that the take-off and landing edges share a commontiming interval (Conow et al. 2010). Spread events, unlike host switch events however, donot consist of a bifurcation, and as a result are not dependent on the internal nodeordering of the parasite phylogeny, which results in spread being a more generalised versionof a host switch event.With the addition of the spread event we are required to update the cost vector V to include S , the cost of a spread event, along with updating the objective function E asfollows: E = αC + βD + γW + δL + (cid:15)F + ζS (2)where ζ represents the number of spread events in the resultant map, Φ. It is important tonote that even with the addition of spread as an additional evolutionary event, the totalnumber of widespread events in Equation (2) ( (cid:15) + ζ ) is equal to the number offailure-to-diverge events in Equation (1) ( (cid:15) ).Using this new formulation of the objective function E , we derive a polynomiallybounded algorithm to solve the cophylogeny reconstruction problem where widespreadparasites are permitted (the widespread parasites problem), where the internal nodes in thehost phylogeny are fixed. The proposed method extends the Improved Node Mappingalgorithm (Drinkwater and Charleston 2014a, 2015a), to recover solutions to thewidespread parasite problem using both spread and failure-to-diverge. The describedethodology, however, is designed so that it can be integrated into other mappingalgorithms which leverage a fixed internal node ordering, such as Edge Mapping(Yodpinyanee et al. 2011) and Slicing (Doyon et al. 2010). This method is then integratedinto an existing metaheuristic framework similar to that implemented in Jane (Conowet al. 2010), which allows for this method to provide robust estimations for the widespreadparasites problem in a reasonable period of time. The order of evolutionary events
Along with integrating both spread and failure-to-diverge within a commonframework, our model aims to provide additional flexibility when inferring the position of awidespread event within the reconciled map. Current state of the art algorithms such asEdge Mapping applied in Jane, provide strict bounds on the position where afailure-to-diverge event may occur. These bounds only allow for a subset of the totalnumber of mapping locations to be considered prior to the widespread event. For exampleconsider the tanglegram in Figure 3 (left) which includes a single widespread parasite. Theminimum cost map inferred by Jane, Figure 3 (right), for this specific instance includes 2failure-to-diverge events, 1 host switch event and 1 loss event.There is an alternate reconstruction for this system, however, where the minimumcost map contains 2 failure-to-diverge events and 1 codivergence event, Figure 3 (centre).Under all previously published cost schemes this map is considered more parsimonious.Jane is unable to reconstruct this specific map, however, as its algorithm enforcesconstraints on the number of locations where divergence events may be placed following aset of widespread events. This bound is appropriate as it does allow for a faster runningtime, however, this bound in cases such as this may give rise to reconciliations which areless parsimonious.s computational power continues to become faster and cheaper, it is important toconsider alternate algorithms which, while potentially less efficient, may provide moreparsimonious solutions to the widespread parasite problem. This is the concept which isexplored herein, where our proposed framework permits divergence events to occur at allfeasible positions prior to and following a set of widespread events. This will increase theasymptotic complexity relative to Jane, in the hope of providing a more parsimoniousreconciliation for the resultant maps.We will show that by increasing the asymptotic complexity by a factor of n that itis possible to provide a solution to the widespread parasite problem which considers bothfailure-to-diverge and spread, and provides a significant accuracy improvement which isrepresentative of one of the largest single improvements offered by a coevolutionaryanalysis technique since Charleston (1998) proposed the Jungle data structure. Integrating Widespread Events into Improved Node Mapping
In this section we introduce a series of amendments which when applied to theImproved Node Mapping algorithm allows for both failure-to-diverge and spread events tobe recovered optimally when reconciling a pair of phylogenetic trees. Prior implementationsof node mapping by Libeskind-Hadas and Charleston (2009), and Drinkwater andCharleston (2014a; 2015b) have only considered the case where a parasite may inhabit asingle host. The amendment described herein not only updates the Improved NodeMapping algorithm to support widespread events, but also resolves the problems associatedwith algorithms such as Jane which were discussed in the previous section.The updated version of the Improved Node Mapping algorithm which we will referto as WiSPA (WideSpread Parasitism Analyser) can be more easily described as a two stepprocess. The first reconciles all optimal widespread events based on an event costs vector, V . This is a reconciliation step which recovers all feasible widespread events, where theecond step recovers the optimal set of divergence events using the previously derived set ofwidespread events. By handling these two complex sets of operations in series it is possibleto ensure that a polynomially bound algorithm may be derived for solving the widespreadparasite problem, where the internal node ordering of the host phylogeny is fixed.Our proposed algorithm reconciles the set of optimal widespread events byconstructing a set of widespread association trees, a process which is derived from anearlier method proposed by Page (1994b). These association trees are then leveraged torecover the optimal set of widespread events, mirroring much of the work proposed by bothPage (1994b) and Siddall and Perkins (2003). Unlike their previous attempts to solve thewidespread parasitism problem which applied a greedy algorithm, our approach applies adynamic programming algorithm to ensure that all feasible states may be considered,avoiding the potential problems that may arise due to local minima or excluding largesubsets of the problem space. Reconstructing Widespread Associations as Trees
To reconcile the set of widespread events for each widespread parasite ( p i ), wepropose a method which translates the set of widespread associations for the parasite node p i to a bifurcating tree. A similar model was first used by Page in 1994b to reconcile thewidespread parasitism identified in the pocket gopher chewing lice coevolutionary systemintroduced by Hafner and Nadler, (1988).The constructed trees which are referred to herein as Association Trees ( a i ), are aset of trees A = ( a . . . a n ), which may be used to infer the optimal set of widespread eventswhere we prove that: Lemma 1.
An association tree ( a i ) is a bifurcating tree constructed based on theassociations, ϕ , present for the parasite leaf node p i which mirrors the topology of H , suchthat a i may infer the maximum number of widespread events.roof. Consider the parasite leaf node p i with k widespread associations. The maximumnumber of possible widespread events is the case where k failure-to-diverge events may berecovered. This is because for all cases it is possible to recover k spread events for all trees,due to the construction of the host tree, such that all leaves share a common timinginterval (the present) (Conow et al. 2010). Therefore an association tree, a i , whichmaximises the number of failure-to-diverge events will maximise the total number ofpossible widespread events.A mirrored tree constructed in line with Fahrenholz’s (1913) Rule will alwayspermit k failure-to-diverge events, as each internal node in the mirrored tree corresponds toan internal node in the host tree (Fahrenholz 1913; Paterson and Banks 2001). Therefore ifwe construct a i for p i which mirrors H based on the associations ϕ ,then we will maximisethe number of possible widespread events which are able to be recovered using theassociation tree.By maximising the number of possible events recovered, we ensure that the optimalset of widespread events may be inferred. This is due to the order of widespread eventsbeing unbounded, as widespread events are not dependent on the internal order of P .Therefore, this approach while ensuring that an optimal set of widespread events isrecovered, does not guarantee that the order of events inferred is correct, as there is noinformation in the initial problem instance to provide such an inference. Further,information about the problem instance would be required to infer the order of widespreadevents, such as the geographical history of both host and parasite. This, along with theconsideration of preferential spread events, is a topic to be considered in later revisions ofthe WISPA algorithm.In order to construct the association trees in line with Fahrenholz’s (1913) Rule, wefind the unique subtree where each leaf in the association tree is associated with one of thenitial widespread associations. The recovery of an association tree can therefore bereduced to the problem of recovering the homeomorphic subgraph of H for the leavesinhabited by the widespread parasite p i (Lozano et al. 2007).To construct the homeomorphic subgraph we apply the pruning algorithm describedin detail by Lozano et al. (2007) which creates a copy of H where only the host leavesinhabited by p i are retained. This algorithm is applied for each widespread parasite whichgives rise to the set A = ( a . . . a n ).The associations trees A = ( a . . . a n ) mirror H based on each parasite’s widespreadassociations and therefore each leaf in the association tree a i has a one-to-one associationwith a leaf in H , such that each leaf node in the association tree a i maps to a unique leafnode in H . This property is not one that is imposed on a standard tanglegram, but is animportant property that we leverage to reconstruct widespread events (see next section). Recovering Widespread Events
The widespread events considered herein are derived from existing divergenceevents, and therefore existing techniques for the recovery of divergence events may beapplied to their recovery. This approach while used by Page (1994b) to inferfailure-to-diverge event,s has not been applied to reconcile multiple widespread parasiteswithin a single common framework. To achieve this each widespread event is considered asthe divergence event which most closely matches its behaviour. Under this constraint afailure-to-diverge is recovered from an association tree as a codivergence, and a spread isrecovered from an association tree as a host switch.This is possible as both the optimal codivergence and failure-to-diverge events occurat the most recent common ancestor of their children (Johnson et al. 2003), while theoptimal host switch and spread events may be recovered using an implementation of thelevel ancestor problem (Drinkwater and Charleston 2014a). This is possible as eachidespread event mirrors these two divergence events, with the exception that neitherinclude a divergence. This is resolved by creating pseudo-divergence events through theconstruction of the association trees in line with Siddall and Perkins’s, (2003), proposedreconciliation model.Therefore as both widespread events can be inferred from existing divergenceevents, we may apply existing solutions to the dated tree reconciliation problem, as ameans to recover the optimal divergence events for the set of association trees, A . Thismay in-turn be leveraged to infer the optimal set of widespread events for each associationtree, a i . This is possible as association trees are constructed with a one-to-one mapping,which mitigates the need for duplication events, if host switch events are permitted. This isimportant as there is no widespread equivalent for a duplication event. Exploiting thisimposed property of each association tree, we can reconstruct the map for each associationtree where only codivergence, host switch and loss events are permitted; that is running theexisting Improved Node Mapping algorithm with a cost vector of ( F, ∞ , S, L ), where thecosts for failure-to-diverge ( F ) and spread ( S ) replace the costs for codivergence and hostswitch respectively.The widespread events are inferred from the recovered mappings by relabelling eachcodivergence as a failure-to-diverge and each host switch as a spread. This process requiresthat each divergence event in the resultant dynamic programming table generated by theImproved Node Mapping algorithm may be replaced with its corresponding widespreadevent. The inferred widespread events are then retained within a dynamic programmingtable d i , which contains all the optimal widespread events for the parasite p i . Therefore theresult of mapping the complete set of association trees A into H gives rise to a set ofdynamic programming tables ω = ( d , . . . d n ), containing all the optimal widespread eventsfor the parasite tree P .The ReconcileWidespreadParasite algorithm applied to infer the complete set ofptimal widespread events for a parasite tree P with respect to its host H is defined inFigure 4. This process outlines a new approach to reconciling the incongruence caused bywidespread parasitism. It integrates a number of existing approaches proposed by Page(1994b), Siddall and Perkins (2003), and Brooks (1991), along with integrating the worksof Banks and Paterson (2005), and Johnson et al. (2003) into a single reconciliationmethodology within the context of dated trees. This in turn provides the foundations toinfer the optimal set of divergence events, which is described in detail in the followingsection. Recovering Divergence Events
The recovery of the divergence events using WiSPA is derived from traditionalbottom-up (taxa-to-root) dynamic programming approaches applied in the Slicing (Doyonet al. 2010), Edge Mapping (Yodpinyanee et al. 2011), and Improved Node Mapping(Drinkwater and Charleston 2014a) algorithms. Each of these existing approachesincrementally constructs their resultant map using a series of sub-solutions, leading to therecovery of an optimal mapping of the parasite phylogeny into its host.One such method, the Improved Node Mapping algorithm, is a cubic time solutionfor the dated tree reconciliation problem. This approach reconciles the incongruencedisplayed for each parasite node, by reconciling the optimal divergence event based on theset of mapping sites for its children. This requires a nested set of loops so that everymapping site for the left child is compared with every mapping site of the right child.The WiSPA algorithm unlike Improved Node Mapping considers multiple optimallocations for each parasite node, rather than a single optimal mapping site which has beenthe premise of all cubic time solutions to this problem (Doyon et al. 2010; Yodpinyaneeet al. 2011; Drinkwater and Charleston 2014a). In this more complex case, rather than anoptimal mapping site for each pair of children, the optimal mapping may occur at anyocation, with the sub-solution defined by the widespread mapping. Initial analysis maysuggests that this additional complexity may induce a further set of quadratic comparisons.This, however, can be mitigated by exploiting a number of topological properties of theunderlying dynamic programming table and the topology of the resultant map, both ofwhich are explored within this section.The first point which should be noted is that the dynamic programming tabletraditionally only retains a single mapping site for the parasite leaves (Drinkwater andCharleston 2015a). It is possible, however, to retain multiple mappings for each parasiteleaf node, where in fact there is the ability to retain a mapping site for each parasite nodeto all locations in the host tree, without increasing the asymptotic complexity of theImproved Node Mapping algorithm. Exploiting this property of the dynamic programmingtable in handling widespread parasites was introduced as a possibility during theformulation of the original Improved Node Mapping algorithm (Drinkwater and Charleston2014a). By allowing a set of mappings for each parasite node of this size, allows for theoptimal set of mapping sites stored for the root of the association tree a i , corresponding tothe parasite node p i in question, to be retained within the dynamic programming table.This in turn allows for the optimal set of widespread events to be considered within thecontext of inferring a set of optimal divergence events.To handle the additional complexity which arises due to handling multiplewidespread parasite events, the Improved Node Mapping algorithm has been split such thatit considers three possible scenarios, including the case where the left child is treated as awidespread parasite, the right child us treated as a widespread parasite or neither child istreated as a widespread parasite, as can be seen in Figure 5. In the case where the left orright child is treated as a widespread event (lines 17 - 28), the divergence event may beplaced at an earlier time period to root of the widespread event (either a failure-to-divergeor a spread event), as long as the relative order of the parasite phylogeny is preserved.hat is while a divergence event may be placed prior to multiple widespread events, it maynever be placed at a position prior to one of its descendants. Prior in this context refers toa position closer to the present, as the solutions are constructed in reverse, from the tips tothe root. To provide this additional degree of flexibility when reconciling the incongruencebetween the parasite and its host, requires that all positions within the host be consideredas a possible mapping site for each pair of points, adding an additional nested loop (onlines 18-21 and 25-27, in the case where the left or right child are widespread respectively).Handling widespread parasitism in this fashion results in either a widespread eventbeing the root of a sub-solution, such as a failure-to-diverge event occurring at a timeperiod in the past before any of the divergence events, or a divergence event occurring asthe root of a sub-solution. In the latter case this sub-solution from this point onwards isconsidered as a standard mapping site, in line with previous models, while in the casewhere the root is a widespread event, its parent too will be required to traverse thecomplete search space to allocate the optimal divergence event, and therefore an additionallayer of computational complexity is added with this approach, discussed in detail in thefollowing section.The major benefit of this model is that in the case where both the left and rightchildren are widespread parasites, it is possible to abstract away any possible compoundingcomplexity by considering each widespread parasite in series. This reduces the need for anadditional increase in the computational complexity of the proposed model, which isachieved by noting that a divergence event may not occur prior to the root of bothwidespread events, as this would reflect the occurrence of divergence events, and as suchone of the two widespread events must be considered as a root, or the divergence eventitself may be the root of both lineages. This is in line with the theory considered by Fish,(2013), in the development of the third version of Jane was the first version to considerwidespread parasitism.n the final case (lines 29 - 32) neither the left or right child are rooted bywidespread events. In this case the complexity of widespread parasitism is already fullyexplained within the sub-solution, or the sub-solution does not contain any widespreadevents. In either case such a sub-solution is processed in-line with the existing ImprovedNode Mapping algorithm, and no further changes are required to the algorithm presentedin Figure 5 to handle this case.Therefore by reconciling the optimal set of divergence events based on the optimalset of widespread evolutionary events retained within ω it is possible to handle multiplewidespread evolutionary events and to overcome the limitations identified within thealgorithm applied by Jane. In the following section the asymptotic complexity of thealgorithm is discussed, where we prove that the additional accuracy provided by the modelis achieved by adding only a O ( n ) increase in the complexity of the Improved NodeMapping algorithm, resulting in a complexity which is comparable to software tools such asCostscape and Eventscape (Libeskind-Hadas et al. 2014) and significantly faster than Jane1 (Conow et al. 2010) and the Jungle method (Charleston 1998, 2012), all of which arepopular co-evolutionary analysis methods. Complexity Analysis
The WiSPA algorithm is designed using a series of underlying algorithms to providethe most accurate algorithm for handling widespread parasites. In this section we analysethe associated computational complexity of this approach, and how this compares toexisting algorithms applied within the field of coevolutionary analysis of widespreadparasites.For the complexity analysis considered herein we consider the number of nodes inthe host tree to be 2 n −
1. That is that the host tree contains n leaves and n − m − m leaves and m − k where k ≤ n . That is no singleparasite may have more associations then there are unique host leaves to infect.The WiSPA algorithm is composed of two computationally expensive steps. Thefirst is the processing required to handle the parasites which inhabit more than one host,specifically constructing and solving the association trees (lines 7-12 in Figure 5), and thesecond step is processing the divergence events, the internal nodes in the parasite tree(lines 14-34 in Figure 5).Processing the leaves in the parasite tree requires the construction of O ( m )association trees which are of size O ( k ). The association trees are constructed using anapplication of Lozano et al.’s (2007) homeomorphic subgraph pruning algorithm, whichruns in O ( kn ) for each of the O ( m ) widespread parasites. Therefore the time required toconstruct the set of association trees, A , is O ( kmn ). The solutions for each of theseassociation trees are stored with an array of dynamic programming tables, where eachtable is of size O ( nk ), where the array of dynamic programming tables ω contains O ( m )elements. Therefore the space requirement for the step is O ( kmn ). Solving each of theassociation trees requires O ( kn ) time, and therefore as O ( m ) trees need to be solved thetotal running time of this step is O ( kmn ).Reconciling the divergence events (lines 14-34 in Figure 5) requires mapping theparasite into the host using the additional information retained within the list of dynamicprogramming tables, ω . As the additional widespread information is retained within ω noadditional space is required compared to the original dynamic programming tableconstruction defined by Drinkwater and Charleston (2014a), and therefore the spacerequired is O ( mn ). The running time however requires an additional step which involvesiterating over all the possible widespread locations of which there may be O ( k ) for eachmapping site considered, and therefore the running time is extended from O ( mn ), asefined within the original implementation of the Improved Node Mapping algorithm, to O ( kmn ).This time and space complexity is quite significant considering that the complexityof the proposed algorithm grows linearly in regards to the number of additional widespreadassociations which are added to the tanglegram. That is while the Improved NodeMapping algorithm runs in cubic time when considering only O ( n ) associations, ourproposed algorithm runs in quartic time when considering O ( n ) associations. Therefore inthe case where only one additional widespread association is added to each parasite, thetotal running time only increases by a factor of two. This is significant as the number ofwidespread associations for each parasite will never be of size O ( n ) under any realisticbiological scenario. For example, if we consider the 15 previously published biological datasets introduced later to validate our model, it may be observed that on average the rate ofwidespread parasitism is approximately 7%, which compared to the size of the data sets isless than log n , which argues that while the worst case running time for the proposedalgorithm is quartic, the actual running time in practice is actually more comparable toexisting cubic time algorithms. Implementation and Validation
The algorithm proposed herein is implemented in Java and is available as aplatform-independent jar file. The underlying algorithm is integrated into a geneticalgorithm, which is designed to run in a multithreaded environment, similar to the designproposed by Conow et al. (2010). The advantage of Conow et al.’s (2010) model is thenear-linear speedup possible using multi-core systems.Jane 4 (Conow et al. 2010) was selected as the algorithm to validate the theoreticalmodel presented herein. Jane is the best candidate to evaluate the performance of WiSPAas both methods are designed to minimise the total cost of all evolutionary eventsonsidered, and that they both leverage an underlying algorithm to solve the dated treereconciliation problem as a means to inform their metaheuristic framework. CoRe-PA andCoRe-ILP were not considered, as for the size of the data sets considered herein Jane hasbeen shown to outperform both these techniques (Conow et al. 2010; Wieseke et al. 2015).The evaluation of our new model is broken into two parts. The first considers Janeand WiSPA’s accuracy over 500 synthetic data sets which display varying degrees ofwidespread parasitism. Then Jane and WiSPA are evaluated over 15 previously publishedbiological systems. In both evaluations two key metrics are considered. The first is thetotal cost of the reconciliation inferred by each model, and the second is the total numberof codivergence events present in the inferred reconciliation. Each of these two valuesrepresent the degree of congruence represented by the reconciled map, where the aim forcoevolutionary analysis is to infer the minimum cost map with the maximum number ofcodivergence events (Littlewood 2003). Therefore each model will be validated on how wellthey conform to this criteria. These two key metrics align with prior analysis ofcoevolutionary techniques (Page 1994a, 2002; Ronquist 1998; Conow et al. 2010; Wiesekeet al. 2015), and are considered the best two signals for recovering a biologically relevantmap which most accurately represents the actual coevolutionary interactions.Along with demonstrating the effectiveness of the generalised model applied withinWiSPA, this analysis also aims to infer the significance of the inclusion of the spreadevolutionary event. This was achieved by considering three different costs for theevolutionary event spread; a cost of one, which is equal to the cost of a failure-to-divergeevent, a cost of two, the same cost as a host switch event the evolutionary event which ismost similar to the spread event, and finally the case where a spread event is not permitted(in essence assigned a cost of ∞ ).Each of these values for the spread event are integrated into the Jungle cost scheme(Ronquist 2003) to provide three unique event cost schemes for this analysis, including = (0 , , , , , V = (0 , , , , , V = (0 , , , , , ∞ ). When evaluating theperformance of Jane using these cost vectors the recovered map will always have the samecost, as varying the cost of spread has no bearing on the cost of the recovered map by Jane. Discussion and Analysis
The analysis performed herein using a combination of synthetic and biological datasets will demonstrate that WiSPA is able to converge on maps with a lower event cost,with a significantly higher number of codivergence events. The significance of this result isthat even in the case where spread is not permitted, WiSPA is observed to perform 2%better in practice. This is shown to only improve as spread is permitted, and its associatedpenalty cost is reduced.Following this successful result we continue our analysis of WiSPA and Jane byconsidering the
Primate – Enterobius biological data sets in further detail. This biologicalsystem has long been considered a likely coevolutionary system, however, the inability ofprior models to handle the widespread parasitism resulted in no previous model providinga statistically significant coevolutionary hypothesis for the sub-clade considered herein. Wedemonstrate that while Jane may be unable to provide such a hypothesis, for certain valuesof spread, WiSPA is able to provide a statistically significant coevolutionary hypothesis forthis system.
Overall Performance on Synthetic Data
The synthetic data sets used to evaluate WISPA were previously constructed usingthe Cophylogeny Generation Model (Core-Gen) (Keller-Schmidt et al. 2011). Thesecoevolutionary histories were constructed using a standard Yule Model, a commonsynthetic tree generation model applied in phylogenetics (Steel and McKenzie 2001).reviously Keller-Schmidt et al. (2011) constructed 1000 synthetic data sets, where for thisevaluation we have randomly selected 50 of these to provide a baseline for this comparison.As Core–Gen can only generate coevolutionary systems where each parasite infectsa single host, the existing data sets needed to be modified to induce widespread parasitism.From each of the 50 synthetic data sets initially selected, nine additional data sets werecreated by randomly applying additional widespread associations to the initial data sets.These additional nine new data sets present a varied degree of widespread parasitism, withthe aim to model a decreasing rate of host specificity.For the nine data sets we allowed additional widespread events to be added for eachparasite, such that the maximum rate of additional widespread parasitism was 10%, 15%,20%, 25%, 30%, 35%, 40%, 45%, and 50% of the total available host species for each of thenine data sets respectively. This is applied by selecting each parasite node and allowing theparasite to infect a random number of additional host species between (0 and p × n ), where p is the rate of widespread parasitism for the specified synthetic system and n is thenumber of host taxa. It should be noted that this model is a crude representation ofwidespread parasitism in nature, however, it provides a robust set of synthetic data sets tocompare Jane and WiSPA over varying degrees of widespread parasitism.This technique is also advantageous as it provides a baseline of the number ofcodivergence events present in the original tanglegram, which can be compared to therecovered number of codivergence events of each technique as the rate of widespreadparasitism increases. Therefore both the rate at which the total event cost increases andthe total number of codivergence events decreases, may be tracked for the modelsconsidered within this analysis.This is captured in Figure 6 where the total event cost (left) and total number ofcodivergence events (right) are recorded for the ten data sets (including the baseline) forthe four models considered. These two plots provide the best insight to date in regards tohe benefits of the spread event, particularly in increasing the total number of codivergenceevents compared to using failure-to-diverge exclusively.In the case where spread is set to one a reduction of more than 50% is achieved inthe parsimony score, with this reduction only increasing as the rate of widespreadparasitism increases. This reduction is complimented by a nine fold increase in the numberof codivergence events. A similar trend is observed where spread is set to two. Here areduction of more than 35% is achieved in the parsimony score, with this reduction onlyincreasing as the rate of widespread parasitism increases. The reduction in this case iscomplimented by an eight fold increase in the number of codivergence events.In both cases where spread is permitted a significant improvement in thecongruence of the reconciled maps is achieved. In the case where spread is not permittedsuch a drastic improvement is not observed although there is a 1% decrease in the totalparsimony cost compared to Jane with an increase of 2% in the number of codivergenceevents. While nowhere near as impressive as the case where spread is permitted, thisimprovement is important as while it represents our model in the worst case it still shows itoutperforms Jane and only improves as the cost of spread is decreased. Overall Performance on Real Data
The performance over the synthetic data set demonstrates the value of thegeneralised model applied within WiSPA along with the advantage of applying the spreadevent for the analysis of systems presenting widespread parasitism. As noted, however, themodel applied to generate the synthetic data sets does not provide the best representationof widespread parasitism within a biological system, although it is the first synthetic modelwhich attempts to model this phenomenon. Therefore our analysis also compares theperformance of Jane and WiSPA over biological data sets.or this analysis 15 biological data sets were selected to compare WiSPA with thelatest version of Jane. These biological systems considered 10 biological phenomena,including but not limited to parasitism (Hafner and Nadler 1988), plant–insect interactions(G´omez-Acevedo et al. 2010), coevolutionary dynamics between a virus and its host(Jackson and Charleston 2004), mutualism (McLeish and Van Noort 2012), parasitoidism(Murray et al. 2013), and plant–fungal coevolution (Refr´egier et al. 2008). The completelist of biological systems included in this analysis and the coevolutionary interrelationshipseach system expresses has been listed in Table 1. These data sets were selected to evaluatewhether spread can assist in recovering more parsimonious reconstructions, along withevaluating the newly proposed model for reconciling widespread parasitism. This analysisalong with evaluating the cost of each reconciliation, also considers the number ofcodivergence events recovered as another means of evaluating the inferred congruencefound by each technique considered herein.It should be noted that while 15 data sets does not compare to the 500 data setsconsidered in the previous section, this selection of biological data represents the largestcollection of coevolutionary systems displaying widespread parasitism assembled to date.While larger collections exist for the case where parasite only parasites infect a single hostsuch as the 102 data sets catalogued by Drinkwater and Charleston (2014b) this is the firstand therefore largest selection of widespread coevolutionary systems aggregated to date.The results of this comparison are displayed in Table 2 and show a significantimprovement in both the reconciliation’s cost, and the total number of codivergence eventswhen including the spread event in the reconstruction of the parasites’ evolutionary historywith respect to their host. In all cases where spread was assigned a cost of one, the newlyproposed algorithm found a solution that was at least as parsimonious as Jane, with themajority of cases inferring a solution which was significantly more parsimonious and with ahigher number of codivergence events. Over the 15 data sets there was an observededuction of 35% in the event cost, and an increase of 21% in the number of codivergenceevents. A similar trend was observed in the case where spread was assigned a cost of 2.Here in all cases WiSPA was able to find a solution which was at least as parsimonious interms of event cost, with a number of cases where WiSPA was able to infer a reconciliationwhich was significantly more parsimonious and with a higher number of codivergenceevents. Over the 15 data sets there was an observed reduction of 22% in the event cost andan increase of 18% in the number of codivergence events.This demonstrates the value of the spread event for coevolutionary analysis, where asignificant reduction in the total parsimony cost may be achieved using the spread event,with the largest single reduction providing a 55% decrease in the total event cost. Theseresults match the benefits observed over the synthetic data sets, providing further evidenceof the value of adopting the spread event for widespread analysis.In the case where spread was not permitted, WiSPA was still able to outperformJane, although the improvement was not as pronounced. Overall there was a 2% reductionin the total cost of the 15 maps with no difference in the total number of codivergenceevents inferred across the 15 systems. This is still a significant result as this is the worstcase performance of the newly proposed model for reconciling widespread parasitism, andeven then we are able to present an improvement of 2%. While in the case where spread isnot permitted many systems perform as well as Jane, there is one particular system whichdisplays a significant improvement. The ant–wasp parasitoid coevolutionary system’s costis reduced by 25% by allowing a divergence event to occur prior to multiple widespreadevents. Such a significant reduction is the difference between a map which is only cheaperthan 61.19% of random solutions as in the case of Jane compared to a map which ischeaper than 94.96% of random solutions as in the case of WiSPA. These results are basedon the randomisation test undertaken using Jane, using 10000 random instances.hile in all cases WiSPA was able to recover a map which was equal to or less thanthat which was recovered by Jane it should be noted that there was a case where Jane wasable to outperform WiSPA in terms of inferring a map where the total number ofcodivergence events was higher. In the RNA Virus example, which has been marked asbold in Table 2, it can be seen that Jane’s best reconciliation contains 5 codivergenceevents while WiSPA only infers a map with 4. In both cases the recovered map has a costof 15 and as such current significance testing considers these two model equivalent. Ifsubjected to the same randomisation test considered for the ant–wasp parasitoidcoevolutionary systems neither map is considered significant, ( p = 0 . Spread provides stronger evidence for
Primate – Enterobius
Coevolution
Primate and
Enterobius (Pinworms) have long been considered as a possiblecoevolutionary system (Cameron 1929; Sandosham 1950; Sorci et al. 1997; Hugot 1999).This hypothesis is due to the high degree of congruence which has been observed betweenthese two phylogenetic trees (Hugot 1999). Brooks and Glen (1982) identified that whilethe observed congruence within this system strongly supported coevolution that thereremains a subset of the
Primate and
Enterobius tanglegram which did not appear toprovide evidence of coevolution. In particular it was noted that the species
E.vermicularies infection of both
Hylobatidae (Gibbon) and
Homo sapien (Human) could notbe explained by traditional coevolutionary models.his failure by cladistic models was due to the inability of coevolutionary analysisto reconcile widespread parasitism. This was later rectified as part of SBPA which Brooksand McLennan 2003 applied to this system although once again no specific modelling forthe relationships between the species
E. vermicularies which inhabits both humans andgibbons was provided. This unexplained sub-clade has also been considered by cophylogenymodels as well as cladistic approaches, with Ronquist (1997) proposing that thisinconsistency was due to a recent host switch event from gibbons to humans. One weaknesswith this hypothesis, however, is that it assumes that
E. vermicularies has diverged duringthe infection of Humans which current evidence does not support (Brooks and Glen 1982).As a result, a complete hypothesis which reconciles the observed data within this potentialcoevolutionary system and in particular this sub-clade has remained unanswered.While initial coevolutionary analysis assumed widespread parasitism cannot occur(Poulin 2011) in a coevolutionary context, this has gradually become more accepted aspotentially occurring depending on the nature of the coevolutionary system considered.Laboratory experiments have shown that
Enterobius is a species which displays a low hostspecificity, with results as early as Sandosham (1950) noting that a number of species of
Enterobius infected phylogenetically distant primates held within captivity and whichwould not associate with one another in the wild due to vast geographical diversity. Fromthis evidence it does not seem infeasible that
E. vermicularies may also be able to infectmultiple host species.The infection of humans has a higher probability due to humans no longer beingbound by their biogeographical environment. This hypothesis agrees with existingcoevolutionary analysis focusing on tapeworms, which have been shown to have a low hostspecificity wereby species were able to infect humans during their dispersal from Africa 2.5million years ago (Hoberg et al. 2001).Therefore, we attempt to provide a coevolutionary explanation applying widespreadarasitism to this sub-clade using both the methodologies applied in Jane and WiSPA. Wefirstly evaluate the two recovered maps from Jane and WiSPA and discuss their inferred setof biological events and their implications. These maps are then evaluated statistically toevaluate if either method rejects the null hypothesis that these two phylogenetic trees areindependent from one another. For this analysis we apply the Jungle cost scheme(Ronquist 2003) including spread with a cost of both one and two.To provide a fair statistical analysis we generate all feasible widespread systemswhich include one additional widespread association between the parasite and its host. Intotal there are 10000 systems where the host and parasite phylogenies are fixed and theassociations are randomised. By generating a single instance of all possible maps, weguarantee that no bias is introduced using different randomisation techniques for eachmodel. These models may be generated by computing all possible association pairs of whichthere are 5 and multiplying this by the total number of unique additional associationsthat may be applied, which is (5 − ×
4. The minus one is due to the inability to applymore than one association between a single parasite and a single host. Therefore the totalnumber of unique systems that may be generated for the
Primate and
Enterobius (Pinworms) system presented in Figure 7 is 10000 (5 × (5 − × . , . V = (0 , , , , ,
1) and V = (0 , , , , ,
2) is visualised in Figure 8 (right). In both cases the same map wasinferred where the only difference was that the recovered spread event costs more in thelatter case. This map consists of three codivergences, one loss event and one spread whichhas a resultant cost of two or three respectively. This map provides the hypothesis thatthis system has been coevolving throughout its evolutionary history with all divergenceevents indicative of coevolution. This widespread event for
E. vermicularies in this map isexplained using a recent spread event from gibbon to human. As previously discussedspread requires that the hosts are biologically collocated at the time spread occurs. Thiscollocation can be explained in this case as humans are no longer geographically bound andtherefore spread’s potential is significantly higher than for other geographically boundprimates. The loss event in this reconstruction can be explained by integratingciteauthorronquist1997phylogenetic’s (1997) prior hypothesis. In particular he noted theintroduction of
E. vermicularies into humans may have caused an extinction of a species of
Enterobius with a common ancestor of
E. anthropopitheci .If we compare WiSPA’s recovered map to the same 10000 unique instances that wereused to evaluate Jane’s map, it can be seen that there is strong evidence for coevolution inthis case, as seen in Figure 9 (center and right). Using the Wilson score interval weconverge on a confidence interval for the case where spread is a cost of one and two of0 . , . . , . . , . Primate / Enterobius tanglegram can be explained using widespread parasitism when applying thespread event. In particular we note that the algorithm WiSPA is the only method that isable to recover a widespread solution to this instance and provide a statistically significantsignal for coevolution for this evolutionary system.
Conclusion
This work presents a new model for reconciling the incongruence that may arisebetween a pair of phylogenetic trees where parasites are permitted to inhabit more thanone host. While this permutation of the cophylogeny reconstruction problem has oftenbeen considered to be computationally complex, we provide a polynomial solution in thecase where their exists timing information for the host phylogeny. In the case where suchtiming information is unavailable, we provide a metaheuristic framework which applies ourunderlying algorithm, which is shown to be the most accurate model for widespreadparasitism produced to date.The accuracy improvement present within our proposed model (WiSPA) is due toits inclusion of an additional widespread evolutionary event, which we refer to as spread,along with it providing a more generalised framework for inferring the optimal set ofwidespread events. The additional widespread evolutionary event applied herein is derivedfrom a number of previous coevolutionary models, along with observed parasitic behaviourin nature and the laboratory, where the inclusion of the spread event alone has been shownto provide an accuracy improvement of over 55%.he accuracy improvement comes at a cost however, where our model is shown tobe an order of magnitude slower then the current state of the art algorithm applied inJane. While this algorithm is more computationally expensive than algorithms applied inthe latest version of Jane (Libeskind-Hadas 2015) and CoRe-PA (Merkle et al. 2010), ouralgorithm is still far superior to Jane 1 (Conow et al. 2010) and the Jungle model appliedin TreeMap, which have both been applied to successfully analyse a number ofcoevolutionary systems, and is also asymptotically more effiecent than the tools within theXscape framework (Libeskind-Hadas et al. 2014), proving that our model is capable ofanalysing biological data sets.Finally we applied WiSPA to the well-studied sub-clade of the coevolutionarysystem of
Primate and
Enterobius . Since this sub-clade was identified by Brooks and Glen(1982) no satisfactory explanation reconciliation of this sub-clade has been derived. Wehave shown that while this has eluded prior models, WiSPA is able to provide astatistically significant hypothesis for this sub-clade which complements the existing theoryof
Primate / Enterobius coevolution, and also provides a plausible biological modelconsistent with broader understanding of primate–parasite coevolution.This result coupled with the results when comparing WiSPA and Jane over thesynthetic and biological data sets considered herein demonstrates the value of our proposedmodel. Not only does WiSPA provide the flexibility of providing an additional evolutionaryevent to explain the incongruence caused by widespread taxa but this model also providesfurther flexibility in reconciling the conflict that may arise when dealing with the order ofwidespread and divergence events. As such we argue for the adoption of this new model toprovide additional insights into the complex problem of reconciling the coevolutionaryassociations of widespread taxa.
ReferencesBanks, J. and A. Paterson. 2005. Multi-Host Parasite Species in Cophylogenetic Studies.International Journal for Parasitology 35:741–746.Bansal, M. S., E. J. Alm, and M. Kellis. 2012. Efficient algorithms for the reconciliationproblem with gene duplication, horizontal transfer and loss. Bioinformatics 28:i283–i291.Baudet, C., B. Donati, B. Sinaimeri, P. Crescenzi, C. Gautier, C. Matias, and M.-F. Sagot.2015. Cophylogeny Reconstruction via an Approximate Bayesian Computation.Systematic Biology 64:416–431.Brooks, D. R. 1991. Phylogeny, ecology, and behavior: a research program in comparativebiology. University of Chicago Press.Brooks, D. R. and D. R. Glen. 1982. Pinworms and primates: a case study in coevolution.Proceedings of the Helminthological Society of Washington 49:76–85.Brooks, D. R. and D. A. McLennan. 2003. Extending phylogenetic studies of coevolution:secondary Brooks parsimony analysis, parasites, and the Great Apes. Cladistics19:104–119.Cameron, T. 1929. The species of Enterobius Leach, in primates. Journal of Helminthology7:161–182.Carbone, L., R. A. Harris, S. Gnerre, K. R. Veeramah, B. Lorente-Galdos, J. Huddleston,T. J. Meyer, J. Herrero, C. Roos, B. Aken, et al. 2014. Gibbon genome and the fastkaryotype evolution of small apes. Nature 513:195–201.eccarelli, F. and R. Crozier. 2007. Dynamics of the evolution of Batesian mimicry:molecular phylogenetic analysis of ant-mimicking Myrmarachne (Araneae: Salticidae)species and their ant models. Journal of Evolutionary Biology 20:286–295.Charleston, M. 1998. Jungles: A new solution to the Host/Parasite PhylogenyReconciliation Problem. Mathematical Biosciences 149:191–223.Charleston, M. 2012. Download TreeMap 3 here.Charleston, M. and A. Galvani. 2006. A cophylogenetic perspective on host-pathogenevolution. DIMACS Series in Discrete Mathematics and Theoretical Computer Science71:145.Charleston, M. and R. Libeskind-Hadas. 2014. Event-Based Cophylogenetic ComparativeAnalysis. Pages 465–480 in Modern Phylogenetic Comparative Methods and TheirApplication in Evolutionary Biology. Springer.Charleston, M. and D. Robertson. 2002. Preferential host switching by primate lentivirusescan account for phylogenetic similarity with the primate phylogeny. Systematic Biology51:528–535.Charleston, M. A. 2002. Principles of cophylogenetic maps. Pages 122–147 in BiologicalEvolution and Statistical Physics. Springer.Charleston, M. A. 2003. Recent results in cophylogeny mapping. Advances in parasitology54:303–330.Charleston, M. A. and S. L. Perkins. 2006. Traversing the Tangle: Algorithms andApplications for Cophylogenetic Studies. Journal of Biomedical Informatics 39:62–71.Clayton, D. H., S. E. Bush, and K. P. Johnson. 2004. Ecology of congruence: past meetspresent. Systematic Biology 53:165–173.onow, C., D. Fielder, Y. Ovadia, and R. Libeskind-Hadas. 2010. Jane: a new tool for theCophylogeny Reconstruction Problem. Algorithms for Molecular Biology 5:16.Cruaud, A., N. Rønsted, B. Chantarasuwan, L. S. Chou, W. L. Clement, A. Couloux,B. Cousins, G. Genson, R. D. Harrison, P. E. Hanson, et al. 2012. An extreme case ofplant–insect codiversification: figs and fig-pollinating wasps. Systematic Biology61:1029–1047.Cuthill, J. H. and M. Charleston. 2012. Phylogenetic Codivergence Supports Coevolutionof Mimetic Heliconius Butterflies. PloS One 7:e36464.Cuthill, J. H. and M. A. Charleston. 2013. A simple model explains the dynamics ofpreferential host switching among mammal RNA viruses. Evolution 67:980–990.Doyon, J.-P., V. Ranwez, V. Daubin, and V. Berry. 2011. Models, Algorithms andPrograms for Phylogeny Reconciliation. Briefings in Bioinformatics 12:392–400.Doyon, J.-P., C. Scornavacca, K. Y. Gorbunov, G. J. Sz¨oll˝osi, V. Ranwez, and V. Berry.2010. An Efficient Algorithm for Gene / Species Trees Parsimonious Reconciliation withLosses, Duplications and Transfers. Pages 93–108 in Comparative Genomics. Springer.Drinkwater, B. and M. Charleston. 2016. RASCAL: A randomised approach forcoevolutionary analysis. Journal of Computational Biology 23:218–227.Drinkwater, B. and M. A. Charleston. 2014a. An Improved Node Mapping Algorithm forthe Cophylogeny Reconstruction Problem. Coevolution 2:1–17.Drinkwater, B. and M. A. Charleston. 2014b. Introducing TreeCollapse: A novel greedyalgorithm to solve the Cophylogeny Reconstruction Problem. BMC Bioinformatics15:S14.rinkwater, B. and M. A. Charleston. 2015a. A Sub-quadratic Time and Space ComplexitySolution for the Dated Tree Reconciliation Problem for Select Tree Topologies.Pages 93–107 in Algorithms in Bioinformatics. Springer.Drinkwater, B. and M. A. Charleston. 2015b. A time and space complexity reduction forcoevolutionary analysis of trees generated under both a Yule and Uniform model.Computational biology and chemistry 57:61–71.Escudero, M. 2015. Phylogenetic congruence of parasitic smut fungi (
Anthracoidea , Anthracoideaceae ) and their host plants (
Carex , Cyperaceae ): Cospeciation or host-shiftspeciation? American Journal of Botany 102:1108–1114.Fahrenholz, H. 1913. Ectoparasiten und abstammungslehre. Zoologischer Anzeiger41:371–374.Fish, B. 2013. The Cophylogeny Reconstruction Problem. Pomona College.G´omez-Acevedo, S., L. Rico-Arce, A. Delgado-Salinas, S. Magall´on, and L. E. Eguiarte.2010. Neotropical mutualism between Acacia and Pseudomyrmex: phylogeny anddivergence times. Molecular Phylogenetics and Evolution 56:393–408.Hafner, M. S. and S. A. Nadler. 1988. Phylogenetic trees support the coevolution ofparasites and their hosts. Nature .Hendricks, S. A., M. E. Flannery, and G. S. Spicer. 2013. Cophylogeny of quill mites fromthe genus
Syringophilopsis ( Acari : Syringophilidae in Algorithms inBioinformatics. Springer.Mart´ınez-Aquino, A., F. S. Ceccarelli, L. E. Eguiarte, E. V´azquez-Dom´ınguez, and G. P.-P.de Le´on. 2014. Do the historical biogeography and evolutionary history of the digeneanmargotrema spp. across central mexico mirror those of their freshwater fish hosts(goodeinae)? PLos One 9:e101700.McLeish, M. J. and S. Van Noort. 2012. Codivergence and multiple Host Species use by FigWasp Populations of the Ficus Pollination Mutualism. BMC Evolutionary Biology 12:1.Mendlova, M., Y. Desdevises, K. Civ´aˇnov´a, A. Pariselle, and A. ˇSimkov´a. 2012.Monogeneans of West African cichlid fish: evolution and cophylogenetic interactions.PLoS One 7:e37268.erkle, D. and M. Middendorf. 2005. Reconstruction of the cophylogenetic history ofrelated phylogenetic trees with divergence timing information. Theory in Biosciences123:277–299.Merkle, D., M. Middendorf, and N. Wieseke. 2010. A parameter-adaptive dynamicprogramming approach for inferring cophylogenies. BMC Bioinformatics 11:S60.Mu, J., D. A. Joy, J. Duan, Y. Huang, J. Carlton, J. Walker, J. Barnwell, P. Beerli,M. Charleston, O. Pybus, et al. 2005. Host switch leads to emergence of plasmodiumvivax malaria in humans. Molecular Biology and Evolution 22:1686–1693.Murray, E. A., A. E. Carmichael, and J. M. Heraty. 2013. Ancient host shifts followed byhost conservatism in a group of ant parasitoids. Proceedings of the Royal Society ofLondon B: Biological Sciences 280:20130495.Nosil, P. and A. Mooers. 2005. Testing hypotheses about ecological specialization usingphylogenetic trees. Evolution 59:2256–2263.Ovadia, Y., D. Fielder, C. Conow, and R. Libeskind-Hadas. 2011. The CophylogenyReconstruction Problem is NP-Complete. Journal of Computational Biology 18:59–65.Page, R. D., R. H. Cruickshank, M. Dickens, R. W. Furness, M. Kennedy, R. L. Palma,and V. S. Smith. 2004. Phylogeny of
Philoceanus complex seabird lice (Phthiraptera:Ischnocera) inferred from mitochondrial dna sequences. Molecular Phylogenetics andEvolution 30:633–652.Page, R. D. M. 1994a. Maps Between Trees and Cladistic Analysis of HistoricalAssociations Among Genes, organisms, and areas. Systematic Biology 43:58–77.Page, R. D. M. 1994b. Parallel Phylogenies: Reconstructing the History of Host-ParasiteAssemblages. Cladistics 10:155–173.age, R. D. M. 2002. Tangled Trees: Phylogeny, Cospeciation, and Coevolution. Universityof Chicago Press, Chicago.Page, R. D. M. and M. A. Charleston. 1997. From gene to organismal phylogeny:reconciled trees and the gene tree/species tree problem. Molecular Phylogenetics andEvolution 7:231–240.Paterson, A. and R. Poulin. 1999. Have Chondracanthid Copepods co-speciated with theirTeleost hosts? Systematic Parasitology 44:79–85.Paterson, A. M. and J. Banks. 2001. Analytical approaches to measuring cospeciation ofhost and parasites: through a glass, darkly. International Journal for Parasitology31:1012–1022.Paterson, A. M., R. L. Palma, and R. D. Gray. 2003. Drowning on arrival, missing theboat, and X-events: How likely are sorting events. Tangled Trees: Phylogeny,Cospeciation, and Coevolution Pages 287–309.Peterson, A. T., J. T. Bauer, and J. N. Mills. 2004. Ecologic and geographic distribution offilovirus disease .Poulin, R. 2011. Evolutionary Ecology of Parasites. Princeton University Press.Refr´egier, G., M. Le Gac, F. Jabbour, A. Widmer, J. A. Shykoff, R. Yockteng, M. E. Hood,and T. Giraud. 2008. Cophylogeny of the anther smut fungi and their caryophyllaceoushosts: prevalence of host shifts and importance of delimiting parasite species for inferringcospeciation. BMC Evolutionary Biology 8:100.Rivera-Parra, J. L., I. I. Levin, K. P. Johnson, and P. G. Parker. 2015. Lineage sorting inmultihost parasites: Eidmanniella albescens and fregatiella aurifasciata on seabirds fromthe Galapagos Islands. Ecology and Evolution .onquist, F. 1995. Reconstructing the history of host-parasite associations usinggeneralised parsimony. Cladistics 11:73–89.Ronquist, F. 1997. Phylogenetic approaches in coevolution and biogeography. Zoologicascripta 26:313–322.Ronquist, F. 1998. Three-Dimensional Cost-Matrix Optimization and MaximumCospeciation. Cladistics 14:167–172.Ronquist, F. 2003. Parsimony analysis of coevolving species associations. Tangled Trees:Phylogeny, Cospeciation and Coevolution Pages 22–64.Sandosham, A. 1950. On Enterobius vermicularis (Linnaeus, 1758) and Some RelatedSpecies from Primates and Rodent. Journal of Helminthology 24:171–204.Siddall, M. E. 1997. The AIDS pandemic is new, but is HIV not new? Cladistics13:267–273.Siddall, M. E. and S. L. Perkins. 2003. Brooks Parsimony Analysis: a valiant failure.Cladistics 19:554–564.Sorci, G., S. Morand, and J.-P. Hugot. 1997. Host–parasite coevolution: comparativeevidence for covariation of life history traits in primates and oxyurid parasites.Proceedings of the Royal Society of London B: Biological Sciences 264:285–289.Steel, M. and A. McKenzie. 2001. Properties of phylogenetic trees generated by Yule-typespeciation models. Mathematical Biosciences 170:91–112.Stireman, J. 2005. The evolution of generalization? Parasitoid flies and the perils ofinferring host range evolution from phylogenies. Journal of Evolutionary Biology18:325–336.oit, N., B. Vuuren, S. Matthee, and C. Matthee. 2013. Biogeography and host-relatedfactors trump parasite life history: limited congruence among the genetic structures ofspecific ectoparasitic lice and their rodent hosts. Molecular Ecology 22:5185–5204.Tuller, T., H. Birin, U. Gophna, M. Kupiec, and E. Ruppin. 2010. Reconstructing ancestralgene content by coevolution. Genome Research 20:122–132.Viale, E., I. Martinez-Sa˜nudo, J. Brown, M. Simonato, V. Girolami, A. Squartini,A. Bressan, M. Faccoli, and L. Mazzon. 2015. Pattern of association between endemicHawaiian fruit flies (Diptera, Tephritidae) and their symbiotic bacteria: Evidence ofcospeciation events and proposal of “Candidatus Stammerula trupaneae”. MolecularPhylogenetics and Evolution 90:67–79.Weckstein, J. 2004. Biogeography explains Cophylogenetic patterns in Toucan Chewinglice. Systematic Biology 53:154–164.Wieseke, N., T. Hartmann, M. Bernt, and M. Middendorf. 2015. CophylogeneticReconciliation with ILP. IEEE/ACM Transactions on Computational Biology andBioinformatics .Yodpinyanee, A., B. Cousins, J. Peebles, T. Schramm, and R. Libeskind-Hadas. 2011.Faster Dynamic Programming Algorithms for the Cophylogeny Reconstruction Problem.HMC CS Technical Report .able 1: Biological systems considered in this analysis and the type of coevolutionary inter-relationship expressed within said system.Coevolutionary system Type of coevolutionaryinterrelationship expressed
Acacia / Pseudomyrmex (G´omez-Acevedo et al. 2010) Plant–Insect Mutualism
Aves / Syringophilopsis (Hendricks et al. 2013) Bird–Mites Parasitism
Carex / Anthracoidea (Escudero 2015) Plant–Fungi Parasitism
Caryophyllaceae / Microbotryum (Refr´egier et al. 2008) Plant–Fungi Mutualism
Cichlidae / Platyhelminthes (Mendlova et al. 2012) Fish–Flatworm Parasitism
Formicidae / Eucharitidae (Murray et al. 2013) Ant–Wasp Parasitoidism
Goodeinae / Margotrema (Mart´ınez-Aquino et al. 2014) Fish–Flatworm Parasitism
Ficus / Agaonidae (McLeish and Van Noort 2012) Plant–Insect Mutualism
Gastropoda / Schistosome (Lockyer et al. 2003) Snails–Flatworm Parasitism
Geomyidae / Mallophaga (Hafner and Nadler 1988) Rodent–Lice Parasitism
Mycocepurus smithii / Fungi (Kellner et al. 2013) Ant–Fungal Mutualism
Ramphastidae / Mallophaga (Weckstein 2004) Bird–Lice Parasitism
Sigmodontinae / Arenaviridae (Jackson and Charleston 2004) Rodent–Viral Coevolution
Teleostei / Copepods (Paterson and Poulin 1999) Fish–Crustacean Parasitism
Tephritidae / Bacteria (Viale et al. 2015) Fly–Bacteria Symbiosisable 2: WiSPA’s performance against Jane 4 over fifteen biological test cases. WiSPA hasbeen run with three different costs associated for spread. Spread was set to a cost of 1, 2and where spread was not permitted in the reconstruction.Coevolutionary system Recovered event cost and (
Acacia / Pseudomyrmex
67 (0) 28 (2) 43 (2) 65 (0)
Aves / Syringophilopsis
17 (9) 17 (9) 17 (9) 17 (9)
Carex / Anthracoidea
73 (9) 59 (9) 65(10) 73 (9)
Caryophyllaceae / Microbotryum
33 (3) 26 (5) 30 (3) 33 (3)
Cichlidae / Platyhelminthes
40 (7) 34 (9) 39 (7) 39 (7)
Formicidae / Eucharitidae
12 (0) 8 (1) 9 (1) 9 (1)
Goodeinae / Margotrema
36 (2) 21 (4) 25 (4) 33 (2)
Ficus / Agaonidae
10 (3) 8 (4) 9 (4) 10 (3)
Gastropoda / Schistosome
122 (1) 54 (3) 77 (2) 120 (1)
Geomyidae / Mallophaga
Mycocepurus smithii / Fungi
42 (1) 21 (2) 28 (3) 41 (1)
Ramphastidae / Mallophaga
17 (2) 12 (2) 14 (3) 17 (2)
Sigmodontinae / Arenaviridae
15 (5) 15 (4) 15 (4) 15 (4)
Teleostei / Copepods
Tephritidae / Bacteria
29 (12) 29 (12) 29 (12) 29 (12)Total 526 (61) 342 (74) 412 (72) 512 (61) ost T ree ( H ) P arasite T ree ( P ) Associations ( ϕ ) F ailure to DivergeCodivergenceDuplication Host SwitchLoss
Figure 1: A tanglegram (left) and one of its optimal maps (right). What is unique aboutthis possible map, Φ, is that it includes all five evolutionary events applied within currentcophylogeny mapping algorithms including Jane, CoRe-PA and CoRe-ILP. ( n ) L o ss E v e n t s Failure to Diverge SpreadHost Tree ( H ) Parasite Tree ( H ) Associations ( ϕ ) Figure 2: A tanglegram (left) and two Pareto optimal solutions using either failure-to-diverge(center) or spread (right).igure 3: Tanglegram which will demonstrate why current implementations of widespreadparasitism reconciliation using cophylogeny mapping fail (left), and two recovered mapswhich include the map recovered from Jane (right), and an optimal map (center). Thealgorithm presented herein is the first method proposed capable of presenting an algorithmicsolution capable of recovering the optimal map for this tanglegram. lgorithm 1
ReconcileWidespreadParasite ( H , P , ϕ , V , p ) Φ is an array of lists which is worst case O ( | P | × | H | ) p is a homeomorphic sub-graph of h including the host leaves ∈ ϕ [ p ] L is a list of nodes in a Sort L in descending based on each nodes distance from the root of a for p i ∈ L do if p i is a leaf then Φ[ p i ] ← leaf h i ∈ H which p i is associated with else l, r ← the left and right children of p i for h i ∈ Φ[ l ] do for h j ∈ Φ[ r ] do Φ[ p i ][ h k ] ← minimum cost event for p i at node h k end for end for end if end for return Φ Figure 4: The ReconcileWideSpreadParasite subroutine called from the WiSPA algorithm(see Figure 5). This method outlines the process to infer the optimal set of widespread eventsfrom a set of association trees, A . lgorithm 2 WiSPA ( H , P , ϕ , V ) Φ is an array of lists ω is an array of dynamic programming tables L ← is a list of nodes in P Sort the nodes in L by their distance from the root of P for p i ∈ L do if p i is a leaf then if p i is widespread then ω [ p i ] ← ReconcileWidespreadParasite( H , P , ϕ , V , p i ) Φ[ p i ] ← ω [ p i ][ p i ] else Φ[ p i ][ h i ] ← leaf h i ∈ H which p i is associated with as defined in ϕ end if else l, r ← the left and right children of p i for h l ∈ Φ[ l ] do for h r ∈ Φ[ r ] do if h l is a widespread mapping then for h l ∈ AllFeasibleMappingSites( H , P , Φ, Φ[ h l ]) do h p ← minimum cost mapping site for p i for children h l and h r Φ[ p i ][ h p ] ← minimum cost event for p i at node h p end for end if if h r is a widespread mapping then for h r ∈ AllFeasibleMappingSites( H , P , Φ, Φ[ h r ]) do h p ← minimum cost mapping site for p i for children h l and h r Φ[ p i ][ h p ] ← minimum cost event for p i at node h p end for end if if h l and h r are not widespread mappings then h p ← minimum cost mapping site for p i for children h l and h r Φ[ p i ][ h p ] ← minimum cost event for p i at node h p end if end for end for end if end for return Φ( P ) Figure 5: The WiSPA algorithm which outlines the process for reconciling the optimal setof widespread and divergence events for a pair of phylogenetic trees ( H and P ), based onthe known associations ( ϕ ) between the two trees. . .
15 0 . .
25 0 . .
35 0 . .
45 0 . T o t a l P a r s i m o n y S c o r e Reported Total Cost
WiSPA (Spread = 1)WiSPA (Spread = 2)WiSPA (No Spread)Jane 0 0 . .
15 0 . .
25 0 . .
35 0 . .
45 0 . T o t a l N u m b e r o f C o d i v e r g e n ce E v e n t s Number of Codivergence Events
WiSPA (Spread = 1)WiSPA (Spread = 2)WiSPA (No Spread)Jane
Figure 6: The results for the synthetic data sets. The first plot (left) considers the rateat which the total cost over 50 synthetic coevolutionary models increases as the rate ofwidespread parasitism is increased, where the second plot (right) considers the rate at whichthe total number of codivergence events over 50 synthetic coevolutionary models decreasesas the rate of widespread parastism is decreased. ibbonhumanchimpanzeegorillaorangutan verm.anth.lero.buck.
Figure 7:
Priamte – Enterobius tanglegram adapted from Brooks and Glen (1982), and Ron-quist (1997) where the widespread associations are marked in red. ibbonhumanchimpanzeegorillaorangutanverm.verm.anth.lero.buck. gibbonhumanchimpanzeegorillaorangutanverm.verm.anth.lero.buck.
Figure 8: Two optimal maps recovered for the Primate / Pinworms data set. The first map(left) is the optimal reconstruction inferred using Jane, while the second map (right) is theoptimal reconstruction inferred by WiSPA. E ) F r e q u e n c y Distribution of Costs of Random Sample Solutions (a) E ) F r e q u e n c y Distribution of Costs of Random Sample Solutions (b) E ) F r e q u e n c y Distribution of Costs of Random Sample Solutions (c)
Figure 9: Results from the Bernoulli trials where 10000 replicates were run. Plot (left)records the distribution of the optimal reconstruction inferred using Jane while plot (center)and (right) record the distribution of the optimal reconstruction inferred by WiSPA for thecost scheme V = (0 , , , , ,
1) and V = (0 ,,
1) and V = (0 ,, ,,
1) and V = (0 ,, ,, ,,
1) and V = (0 ,, ,, ,, ,,
1) and V = (0 ,, ,, ,, ,, ,,