Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Guy Baele is active.

Publication


Featured researches published by Guy Baele.


Molecular Biology and Evolution | 2012

Improving the Accuracy of Demographic and Molecular Clock Model Comparison While Accommodating Phylogenetic Uncertainty

Guy Baele; Philippe Lemey; Trevor Bedford; Andrew Rambaut; Marc A. Suchard; Alexander V. Alekseyenko

Recent developments in marginal likelihood estimation for model selection in the field of Bayesian phylogenetics and molecular evolution have emphasized the poor performance of the harmonic mean estimator (HME). Although these studies have shown the merits of new approaches applied to standard normally distributed examples and small real-world data sets, not much is currently known concerning the performance and computational issues of these methods when fitting complex evolutionary and population genetic models to empirical real-world data sets. Further, these approaches have not yet seen widespread application in the field due to the lack of implementations of these computationally demanding techniques in commonly used phylogenetic packages. We here investigate the performance of some of these new marginal likelihood estimators, specifically, path sampling (PS) and stepping-stone (SS) sampling for comparing models of demographic change and relaxed molecular clocks, using synthetic data and real-world examples for which unexpected inferences were made using the HME. Given the drastically increased computational demands of PS and SS sampling, we also investigate a posterior simulation-based analogue of Akaikes information criterion (AIC) through Markov chain Monte Carlo (MCMC), a model comparison approach that shares with the HME the appealing feature of having a low computational overhead over the original MCMC analysis. We confirm that the HME systematically overestimates the marginal likelihood and fails to yield reliable model classification and show that the AICM performs better and may be a useful initial evaluation of model choice but that it is also, to a lesser degree, unreliable. We show that PS and SS sampling substantially outperform these estimators and adjust the conclusions made concerning previous analyses for the three real-world data sets that we reanalyzed. The methods used in this article are now available in BEAST, a powerful user-friendly software package to perform Bayesian evolutionary analyses.


Molecular Biology and Evolution | 2012

Accurate Model Selection of Relaxed Molecular Clocks in Bayesian Phylogenetics

Guy Baele; Wai Lok Sibon Li; Alexei J. Drummond; Marc A. Suchard; Philippe Lemey

Recent implementations of path sampling (PS) and stepping-stone sampling (SS) have been shown to outperform the harmonic mean estimator (HME) and a posterior simulation-based analog of Akaikes information criterion through Markov chain Monte Carlo (AICM), in bayesian model selection of demographic and molecular clock models. Almost simultaneously, a bayesian model averaging approach was developed that avoids conditioning on a single model but averages over a set of relaxed clock models. This approach returns estimates of the posterior probability of each clock model through which one can estimate the Bayes factor in favor of the maximum a posteriori (MAP) clock model; however, this Bayes factor estimate may suffer when the posterior probability of the MAP model approaches 1. Here, we compare these two recent developments with the HME, stabilized/smoothed HME (sHME), and AICM, using both synthetic and empirical data. Our comparison shows reassuringly that MAP identification and its Bayes factor provide similar performance to PS and SS and that these approaches considerably outperform HME, sHME, and AICM in selecting the correct underlying clock model. We also illustrate the importance of using proper priors on a large set of empirical data sets.


Science | 2014

The early spread and epidemic ignition of HIV-1 in human populations

Nuno Rodrigues Faria; Andrew Rambaut; Marc A. Suchard; Guy Baele; Trevor Bedford; Melissa J. Ward; Andrew J. Tatem; Joao Sousa; Nimalan Arinaminpathy; Jacques Pépin; David Posada; Martine Peeters; Oliver G. Pybus; Philippe Lemey

The hidden history of the HIV pandemic Rail and river transport in 1960s Congo, combined with the sexual revolution and changes in health care practices, primed the HIV pandemic. Faria et al. unpick the circumstances surrounding the ascendancy of HIV from its origins before 1920 in chimpanzee hunters in the Cameroon to amplification in Kinshasa. Around 1960, rail links promoted the spread of the virus to mining areas in southeastern Congo and beyond. Ultimately, HIV crossed the Atlantic in Haitian teachers returning home. From those early events, a pandemic was born. Science, this issue p. 56 The early history of HIV centered on Kinshasa before accelerating in 1960 as a result of seismic social change after independence. Thirty years after the discovery of HIV-1, the early transmission, dissemination, and establishment of the virus in human populations remain unclear. Using statistical approaches applied to HIV-1 sequence data from central Africa, we show that from the 1920s Kinshasa (in what is now the Democratic Republic of Congo) was the focus of early transmission and the source of pre-1960 pandemic viruses elsewhere. Location and dating estimates were validated using the earliest HIV-1 archival sample, also from Kinshasa. The epidemic histories of HIV-1 group M and nonpandemic group O were similar until ~1960, after which group M underwent an epidemiological transition and outpaced regional population growth. Our results reconstruct the early dynamics of HIV-1 and emphasize the role of social changes and transport networks in the establishment of this virus in human populations.


PLOS Pathogens | 2014

Unifying Viral Genetics and Human Transportation Data to Predict the Global Transmission Dynamics of Human Influenza H3N2

Philippe Lemey; Andrew Rambaut; Trevor Bedford; Nuno Rodrigues Faria; Filip Bielejec; Guy Baele; Colin A. Russell; Derek J. Smith; Oliver G. Pybus; Dirk Brockmann; Marc A. Suchard

Information on global human movement patterns is central to spatial epidemiological models used to predict the behavior of influenza and other infectious diseases. Yet it remains difficult to test which modes of dispersal drive pathogen spread at various geographic scales using standard epidemiological data alone. Evolutionary analyses of pathogen genome sequences increasingly provide insights into the spatial dynamics of influenza viruses, but to date they have largely neglected the wealth of information on human mobility, mainly because no statistical framework exists within which viral gene sequences and empirical data on host movement can be combined. Here, we address this problem by applying a phylogeographic approach to elucidate the global spread of human influenza subtype H3N2 and assess its ability to predict the spatial spread of human influenza A viruses worldwide. Using a framework that estimates the migration history of human influenza while simultaneously testing and quantifying a range of potential predictive variables of spatial spread, we show that the global dynamics of influenza H3N2 are driven by air passenger flows, whereas at more local scales spread is also determined by processes that correlate with geographic distance. Our analyses further confirm a central role for mainland China and Southeast Asia in maintaining a source population for global influenza diversity. By comparing model output with the known pandemic expansion of H1N1 during 2009, we demonstrate that predictions of influenza spatial spread are most accurate when data on human mobility and viral evolution are integrated. In conclusion, the global dynamics of influenza viruses are best explained by combining human mobility data with the spatial information inherent in sampled viral genomes. The integrated approach introduced here offers great potential for epidemiological surveillance through phylogeographic reconstructions and for improving predictive models of disease control.


Cell | 2016

Domestication and Divergence of Saccharomyces cerevisiae Beer Yeasts

Brigida Gallone; Jan Steensels; Troels Prahl; Leah Soriaga; Veerle Saels; Beatriz Herrera-Malaver; Adriaan Merlevede; Miguel Roncoroni; Karin Voordeckers; Loren Miraglia; Clotilde Teiling; Brian Steffy; Maryann Taylor; Ariel Schwartz; Toby Richardson; Christopher White; Guy Baele; Steven Maere; Kevin J. Verstrepen

Summary Whereas domestication of livestock, pets, and crops is well documented, it is still unclear to what extent microbes associated with the production of food have also undergone human selection and where the plethora of industrial strains originates from. Here, we present the genomes and phenomes of 157 industrial Saccharomyces cerevisiae yeasts. Our analyses reveal that today’s industrial yeasts can be divided into five sublineages that are genetically and phenotypically separated from wild strains and originate from only a few ancestors through complex patterns of domestication and local divergence. Large-scale phenotyping and genome analysis further show strong industry-specific selection for stress tolerance, sugar utilization, and flavor production, while the sexual cycle and other phenotypes related to survival in nature show decay, particularly in beer yeasts. Together, these results shed light on the origins, evolutionary history, and phenotypic diversity of industrial yeasts and provide a resource for further selection of superior strains. PaperClip


BMC Bioinformatics | 2013

Make the most of your samples: Bayes factor estimators for high-dimensional models of sequence evolution

Guy Baele; Philippe Lemey; Stijn Vansteelandt

BackgroundAccurate model comparison requires extensive computation times, especially for parameter-rich models of sequence evolution. In the Bayesian framework, model selection is typically performed through the evaluation of a Bayes factor, the ratio of two marginal likelihoods (one for each model). Recently introduced techniques to estimate (log) marginal likelihoods, such as path sampling and stepping-stone sampling, offer increased accuracy over the traditional harmonic mean estimator at an increased computational cost. Most often, each model’s marginal likelihood will be estimated individually, which leads the resulting Bayes factor to suffer from errors associated with each of these independent estimation processes.ResultsWe here assess the original ‘model-switch’ path sampling approach for direct Bayes factor estimation in phylogenetics, as well as an extension that uses more samples, to construct a direct path between two competing models, thereby eliminating the need to calculate each model’s marginal likelihood independently. Further, we provide a competing Bayes factor estimator using an adaptation of the recently introduced stepping-stone sampling algorithm and set out to determine appropriate settings for accurately calculating such Bayes factors, with context-dependent evolutionary models as an example. While we show that modest efforts are required to roughly identify the increase in model fit, only drastically increased computation times ensure the accuracy needed to detect more subtle details of the evolutionary process.ConclusionsWe show that our adaptation of stepping-stone sampling for direct Bayes factor calculation outperforms the original path sampling approach as well as an extension that exploits more samples. Our proposed approach for Bayes factor estimation also has preferable statistical properties over the use of individual marginal likelihood estimates for both models under comparison. Assuming a sigmoid function to determine the path between two competing models, we provide evidence that a single well-chosen sigmoid shape value requires less computational efforts in order to approximate the true value of the (log) Bayes factor compared to the original approach. We show that the (log) Bayes factors calculated using path sampling and stepping-stone sampling differ drastically from those estimated using either of the harmonic mean estimators, supporting earlier claims that the latter systematically overestimate the performance of high-dimensional models, which we show can lead to erroneous conclusions. Based on our results, we argue that highly accurate estimation of differences in model fit for high-dimensional models requires much more computational effort than suggested in recent studies on marginal likelihood estimation.


Nature | 2017

Virus genomes reveal factors that spread and sustained the Ebola epidemic

Gytis Dudas; Luiz Max Carvalho; Trevor Bedford; Andrew J. Tatem; Guy Baele; Nuno Rodrigues Faria; Daniel J. Park; Jason T. Ladner; Armando Arias; Danny A. Asogun; Filip Bielejec; Sarah Caddy; Matthew Cotten; Jonathan D’ambrozio; Simon Dellicour; Antonino Di Caro; Joseph W. Diclaro; Sophie Duraffour; Michael J. Elmore; Lawrence S. Fakoli; Ousmane Faye; Merle L. Gilbert; Sahr M. Gevao; Stephen K. Gire; Adrianne Gladden-Young; Andreas Gnirke; Augustine Goba; Donald S. Grant; Bart L. Haagmans; Julian A. Hiscox

The 2013–2016 West African epidemic caused by the Ebola virus was of unprecedented magnitude, duration and impact. Here we reconstruct the dispersal, proliferation and decline of Ebola virus throughout the region by analysing 1,610 Ebola virus genomes, which represent over 5% of the known cases. We test the association of geography, climate and demography with viral movement among administrative regions, inferring a classic ‘gravity’ model, with intense dispersal between larger and closer populations. Despite attenuation of international dispersal after border closures, cross-border transmission had already sown the seeds for an international epidemic, rendering these measures ineffective at curbing the epidemic. We address why the epidemic did not spread into neighbouring countries, showing that these countries were susceptible to substantial outbreaks but at lower risk of introductions. Finally, we reveal that this large epidemic was a heterogeneous and spatially dissociated collection of transmission clusters of varying size, duration and connectivity. These insights will help to inform interventions in future epidemics.


Bioinformatics | 2013

Bayesian evolutionary model testing in the phylogenomics era: matching model complexity with computational efficiency

Guy Baele; Philippe Lemey

MOTIVATION The advent of new sequencing technologies has led to increasing amounts of data being available to perform phylogenetic analyses, with genomic data giving rise to the field of phylogenomics. High-performance computing is becoming an indispensable research tool to fit complex evolutionary models, which take into account specific genomic properties, to large datasets. Here, we perform an extensive Bayesian phylogenetic model selection study, comparing codon and nucleotide substitution models, including codon position partitioning for nucleotide data as well gene-specific substitution models for both data types. For the best fitting partitioned models, we also compare independent partitioning with standard diffuse prior specification to conditional partitioning via hierarchical prior specification. To compare the different models, we use state-of-the-art marginal likelihood estimation techniques, including path sampling and stepping-stone sampling. RESULTS We show that a full codon model best describes the features of a whole mitochondrial genome dataset, consisting of 12 protein-coding genes, but only when each gene is allowed to evolve under a separate codon model. However, when using hierarchical prior specification for the partition-specific parameters instead of independent diffuse priors, codon position partitioned nucleotide models can still outperform standard codon models. We demonstrate the feasibility of fitting such a combination of complex models using the BEAGLE library for BEAST in combination with recent graphics cards. We argue that development and use of such models needs to be accompanied by state-of-the-art marginal likelihood estimators because the more traditional and computationally less demanding estimators do not offer adequate accuracy.


PLOS Computational Biology | 2014

The genealogical population dynamics of HIV-1 in a large transmission chain: bridging within and among host evolutionary rates.

Bram Vrancken; Andrew Rambaut; Marc A. Suchard; Alexei J. Drummond; Guy Baele; Inge Derdelinckx; Eric Van Wijngaerden; Anne-Mieke Vandamme; Kristel Van Laethem; Philippe Lemey

Transmission lies at the interface of human immunodeficiency virus type 1 (HIV-1) evolution within and among hosts and separates distinct selective pressures that impose differences in both the mode of diversification and the tempo of evolution. In the absence of comprehensive direct comparative analyses of the evolutionary processes at different biological scales, our understanding of how fast within-host HIV-1 evolutionary rates translate to lower rates at the between host level remains incomplete. Here, we address this by analyzing pol and env data from a large HIV-1 subtype C transmission chain for which both the timing and the direction is known for most transmission events. To this purpose, we develop a new transmission model in a Bayesian genealogical inference framework and demonstrate how to constrain the viral evolutionary history to be compatible with the transmission history while simultaneously inferring the within-host evolutionary and population dynamics. We show that accommodating a transmission bottleneck affords the best fit our data, but the sparse within-host HIV-1 sampling prevents accurate quantification of the concomitant loss in genetic diversity. We draw inference under the transmission model to estimate HIV-1 evolutionary rates among epidemiologically-related patients and demonstrate that they lie in between fast intra-host rates and lower rates among epidemiologically unrelated individuals infected with HIV subtype C. Using a new molecular clock approach, we quantify and find support for a lower evolutionary rate along branches that accommodate a transmission event or branches that represent the entire backbone of transmitted lineages in our transmission history. Finally, we recover the rate differences at the different biological scales for both synonymous and non-synonymous substitution rates, which is only compatible with the ‘store and retrieve’ hypothesis positing that viruses stored early in latently infected cells preferentially transmit or establish new infections upon reactivation.


Molecular Biology and Evolution | 2016

SpreaD3: Interactive Visualization of Spatiotemporal History and Trait Evolutionary Processes

Filip Bielejec; Guy Baele; Bram Vrancken; Marc A. Suchard; Andrew Rambaut; Philippe Lemey

Model-based phylogenetic reconstructions increasingly consider spatial or phenotypic traits in conjunction with sequence data to study evolutionary processes. Alongside parameter estimation, visualization of ancestral reconstructions represents an integral part of these analyses. Here, we present a complete overhaul of the spatial phylogenetic reconstruction of evolutionary dynamics software, now called SpreaD3 to emphasize the use of data-driven documents, as an analysis and visualization package that primarily complements Bayesian inference in BEAST (http://beast.bio.ed.ac.uk, last accessed 9 May 2016). The integration of JavaScript D3 libraries (www.d3.org, last accessed 9 May 2016) offers novel interactive web-based visualization capacities that are not restricted to spatial traits and extend to any discrete or continuously valued trait for any organism of interest.

Collaboration


Dive into the Guy Baele's collaboration.

Top Co-Authors

Avatar

Philippe Lemey

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Filip Bielejec

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bram Vrancken

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

Trevor Bedford

Fred Hutchinson Cancer Research Center

View shared research outputs
Top Co-Authors

Avatar

Anne-Mieke Vandamme

Rega Institute for Medical Research

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge