Michael Golden
University of Cape Town
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michael Golden.
Virus Evolution | 2015
Darren P. Martin; Ben Murrell; Michael Golden; Arjun Khoosal; Brejnev Muhire
RDP4 is the latest version of recombination detection program (RDP), a Windows computer program that implements an extensive array of methods for detecting and visualising recombination in, and stripping evidence of recombination from, virus genome sequence alignments. RDP4 is capable of analysing twice as many sequences (up to 2,500) that are up to three times longer (up to 10 Mb) than those that could be analysed by older versions of the program. RDP4 is therefore also applicable to the analysis of bacterial full-genome sequence datasets. Other novelties in RDP4 include (1) the capacity to differentiate between recombination and genome segment reassortment, (2) the estimation of recombination breakpoint confidence intervals, (3) a variety of ‘recombination aware’ phylogenetic tree construction and comparison tools, (4) new matrix-based visualisation tools for examining both individual recombination events and the overall phylogenetic impacts of multiple recombination events and (5) new tests to detect the influences of gene arrangements, encoded protein structure, nucleic acid secondary structure, nucleotide composition, and nucleotide diversity on recombination breakpoint patterns. The key feature of RDP4 that differentiates it from other recombination detection tools is its flexibility. It can be run either in fully automated mode from the command line interface or with a graphically rich user interface that enables detailed exploration of both individual recombination events and overall recombination patterns.
Viruses | 2011
Darren P. Martin; Philippe Biagini; Pierre Lefeuvre; Michael Golden; Philippe Roumagnac; Arvind Varsani
Although single stranded (ss) DNA viruses that infect humans and their domesticated animals do not generally cause major diseases, the arthropod borne ssDNA viruses of plants do, and as a result seriously constrain food production in most temperate regions of the world. Besides the well known plant and animal-infecting ssDNA viruses, it has recently become apparent through metagenomic surveys of ssDNA molecules that there also exist large numbers of other diverse ssDNA viruses within almost all terrestrial and aquatic environments. The host ranges of these viruses probably span the tree of life and they are likely to be important components of global ecosystems. Various lines of evidence suggest that a pivotal evolutionary process during the generation of this global ssDNA virus diversity has probably been genetic recombination. High rates of homologous recombination, non-homologous recombination and genome component reassortment are known to occur within and between various different ssDNA virus species and we look here at the various roles that these different types of recombination may play, both in the day-to-day biology, and in the longer term evolution, of these viruses. We specifically focus on the ecological, biochemical and selective factors underlying patterns of genetic exchange detectable amongst the ssDNA viruses and discuss how these should all be considered when assessing the adaptive value of recombination during ssDNA virus evolution.
Proceedings of the National Academy of Sciences of the United States of America | 2015
David M. Mauger; Michael Golden; Daisuke Yamane; Sara E. Williford; Stanley M. Lemon; Darren P. Martin; Kevin M. Weeks
Significance Plus-sense RNA viruses cause diverse pathologies in humans. Viral RNA genomes are selected to encode information both in their primary sequences and in their higher-order tertiary structures required to replicate and to evade host immune responses. We interrogated the physical structures of three evolutionarily divergent hepatitis C virus (HCV) RNA genomes using high-throughput chemical probing and found, along with all previously known RNA-structure–based regulatory elements, diverse previously uncharacterized structures that impact viral replication. We also characterized strategies by which the HCV genomic RNA structure masks detection by innate immune sensors. This structure-first strategy for comparative analysis of genome-wide RNA structure can be broadly applied to understand the contributions of higher-order genome structure to viral replication and pathogenicity. Hepatitis C virus (HCV) infects over 170 million people worldwide and is a leading cause of liver disease and cancer. The virus has a 9,650-nt, single-stranded, messenger-sense RNA genome that is infectious as an independent entity. The RNA genome has evolved in response to complex selection pressures, including the need to maintain structures that facilitate replication and to avoid clearance by cell-intrinsic immune processes. Here we used high-throughput, single-nucleotide resolution information to generate and functionally test data-driven structural models for three diverse HCV RNA genomes. We identified, de novo, multiple regions of conserved RNA structure, including all previously characterized cis-acting regulatory elements and also multiple novel structures required for optimal viral fitness. Well-defined RNA structures in the central regions of HCV genomes appear to facilitate persistent infection by masking the genome from RNase L and double-stranded RNA-induced innate immune sensors. This work shows how structure-first comparative analysis of entire genomes of a pathogenic RNA virus enables comprehensive and concise identification of regulatory elements and emphasizes the extensive interrelationships among RNA genome structure, viral biology, and innate immune responses.
Journal of Virology | 2014
Brejnev Muhire; Michael Golden; Ben Murrell; Pierre Lefeuvre; Jean Michel Lett; Alistair J. A. Gray; Art Y F Poon; Nobubelo Ngandu; Yves Semegni; Emil Pavlov Tanov; Adérito L. Monjane; Gordon William Harkins; Arvind Varsani; Dionne N. Shepherd; Darren P. Martin
ABSTRACT Single-stranded DNA (ssDNA) viruses have genomes that are potentially capable of forming complex secondary structures through Watson-Crick base pairing between their constituent nucleotides. A few of the structural elements formed by such base pairings are, in fact, known to have important functions during the replication of many ssDNA viruses. Unknown, however, are (i) whether numerous additional ssDNA virus genomic structural elements predicted to exist by computational DNA folding methods actually exist and (ii) whether those structures that do exist have any biological relevance. We therefore computationally inferred lists of the most evolutionarily conserved structures within a diverse selection of animal- and plant-infecting ssDNA viruses drawn from the families Circoviridae, Anelloviridae, Parvoviridae, Nanoviridae, and Geminiviridae and analyzed these for evidence of natural selection favoring the maintenance of these structures. While we find evidence that is consistent with purifying selection being stronger at nucleotide sites that are predicted to be base paired than at sites predicted to be unpaired, we also find strong associations between sites that are predicted to pair with one another and site pairs that are apparently coevolving in a complementary fashion. Collectively, these results indicate that natural selection actively preserves much of the pervasive secondary structure that is evident within eukaryote-infecting ssDNA virus genomes and, therefore, that much of this structure is biologically functional. Lastly, we provide examples of various highly conserved but completely uncharacterized structural elements that likely have important functions within some of the ssDNA virus genomes analyzed here.
Journal of General Virology | 2014
Tomasz Stenzel; Tomasz Piasecki; Klaudia Chrząstek; Laurel Julian; Brejnev Muhire; Michael Golden; Darren P. Martin; Arvind Varsani
Pigeon circovirus (PiCV) has a ~2 kb genome circular ssDNA genome. All but one of the known PiCV isolates have been found infecting pigeons in various parts of the world. In this study, we screened 324 swab and tissue samples from Polish pigeons and recovered 30 complete genomes, 16 of which came from birds displaying no obvious pathology. Together with 17 other publicly available PiCV complete genomes sampled throughout the Northern Hemisphere and Australia, we find that PiCV displays a similar degree of genetic diversity to that of the related psittacine-infecting circovirus species, beak and feather disease virus (BFDV). We show that, as is the case with its pathology and epidemiology, PiCV also displays patterns of recombination, genomic secondary structure and natural selection that are generally very similar to those of BFDV. It is likely that breeding facilities play a significant role in the emergence of new recombinant PiCV variants and given that ~50 % of the domestic pigeon population is infected subclinically, all pigeon breeding stocks should be screened routinely for this virus.
PLOS ONE | 2014
Michael Golden; Brejnev Muhire; Yves Semegni; Darren P. Martin
Genetic recombination is a major contributor to the ongoing diversification of HIV. It is clearly apparent that across the HIV-genome there are defined recombination hot and cold spots which tend to co-localise both with genomic secondary structures and with either inter-gene boundaries or intra-gene domain boundaries. There is also good evidence that most recombination breakpoints that are detectable within the genes of natural HIV recombinants are likely to be minimally disruptive of intra-protein amino acid contacts and that these breakpoints should therefore have little impact on protein folding. Here we further investigate the impact on patterns of genetic recombination in HIV of selection favouring the maintenance of functional RNA and protein structures. We confirm that chimaeric Gag p24, reverse transcriptase, integrase, gp120 and Nef proteins that are expressed by natural HIV-1 recombinants have significantly lower degrees of predicted folding disruption than randomly generated recombinants. Similarly, we use a novel single-stranded RNA folding disruption test to show that there is significant, albeit weak, evidence that natural HIV recombinants tend to have genomic secondary structures that more closely resemble parental structures than do randomly generated recombinants. These results are consistent with the hypothesis that natural selection has acted both in the short term to purge recombinants with disrupted RNA and protein folds, and in the longer term to modify the genome architecture of HIV to ensure that recombination prone sites correspond with those where recombination will be minimally deleterious.
Bioinformatics | 2013
Michael Golden; Darren P. Martin
MOTIVATION DOOSS (Data Overlaid On Secondary Structures) is a tool for visualizing annotated secondary structures of large single-stranded nucleotide sequences (such as full-length virus genomes). The purpose of this tool is to assist investigators in evaluating the biological relevance of secondary structures within particular sequences. AVAILABILITY AND IMPLEMENTATION DOOSS is written in Java and is available from: http://dooss.computingforbiology.org CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Molecular Biology and Evolution | 2017
Michael Golden; Eduardo García-Portugués; Michael Sørensen; Kanti V. Mardia; Thomas Hamelryck; Jotun Hein
Abstract Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modeling both “smooth” conformational changes and “catastrophic” conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence–structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof.
BMC Bioinformatics | 2013
James W. J. Anderson; Ádám Novák; Zsuzsanna Sükösd; Michael Golden; Preeti Arunapuram; Ingolfur Edvardsson; Jotun Hein
BackgroundWith the advancement of next-generation sequencing and transcriptomics technologies, regulatory effects involving RNA, in particular RNA structural changes are being detected. These results often rely on RNA secondary structure predictions. However, current approaches to RNA secondary structure modelling produce predictions with a high variance in predictive accuracy, and we have little quantifiable knowledge about the reasons for these variances.ResultsIn this paper we explore a number of factors which can contribute to poor RNA secondary structure prediction quality. We establish a quantified relationship between alignment quality and loss of accuracy. Furthermore, we define two new measures to quantify uncertainty in alignment-based structure predictions. One of the measures improves on the “reliability score” reported by PPfold, and considers alignment uncertainty as well as base-pair probabilities. The other measure considers the information entropy for SCFGs over a space of input alignments.ConclusionsOur predictive accuracy improves on the PPfold reliability score. We can successfully characterize many of the underlying reasons for and variances in poor prediction. However, there is still variability unaccounted for, which we therefore suggest comes from the RNA secondary structure predictive model itself.
bioRxiv | 2018
Michael Golden; Ben Murrell; Darren P. Martin; Jotun Hein
Pairs of nucleotides within functional nucleic acid secondary structures often display evidence of coevolution that is consistent with the maintenance of base-pairing. Here we introduce a sequence evolution model, MESSI, that infers coevolution associated with base-paired sites in DNA or RNA sequence alignments. MESSI can estimate coevolution whilst accounting for an unknown secondary structure. MESSI can also use GPU parallelism to increase computational speed. We used MESSI to infer coevolution associated with GC, AU (AT in DNA), GU (GT in DNA) pairs in non-coding RNA alignments, and in single-stranded RNA and DNA virus alignments. Estimates of GU pair coevolution were found to be higher at base-paired sites in single-stranded RNA viruses and non-coding RNAs than estimates of GT pair coevolution in single-stranded DNA viruses, suggesting that GT pairs do not stabilise DNA secondary structures to the same extent that GU pairs do in RNA. Additionally, MESSI estimates the degrees of coevolution at individual base-paired sites in an alignment. These estimates were computed for a SHAPE-MaP-determined HIV-1 NL4-3 RNA secondary structure and two corresponding alignments. We found that estimates of coevolution were more strongly correlated with experimentally-determined SHAPE-MaP pairing scores than three non-evolutionary measures of base-pairing covariation. To assist researchers in prioritising substructures with potential functionality, MESSI automatically ranks substructures by degrees of coevolution at base-paired sites within them. Such a ranking was created for an HIV-1 subtype B alignment, revealing an excess of top-ranking substructures that have been previously identified as having structure-related functional importance, amongst several uncharacterised top-ranking substructures.