Nick Goldman
European Bioinformatics Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nick Goldman.
Systematic Biology | 2000
Nick Goldman; Jon P. Anderson; Allen G. Rodrigo
Likelihood-based statistical tests of competing evolutionary hypotheses (tree topologies) have been available for approximately a decade. By far the most commonly used is the Kishino-Hasegawa test. However, the assumptions that have to be made to ensure the validity of the Kishino-Hasegawa test place important restrictions on its applicability. In particular, it is only valid when the topologies being compared are specified a priori. Unfortunately, this means that the Kishino-Hasegawa test may be severely biased in many cases in which it is now commonly used: for example, in any case in which one of the competing topologies has been selected for testing because it is the maximum likelihood topology for the data set at hand. We review the theory of the Kishino-Hasegawa test and contend that for the majority of popular applications this test should not be used. Previously published results from invalid applications of the Kishino-Hasegawa test should be treated extremely cautiously, and future applications should use appropriate alternative tests instead. We review such alternative tests, both nonparametric and parametric, and give two examples which illustrate the importance of our contentions.
Journal of Molecular Evolution | 1993
Nick Goldman
SummaryPenny et al. have written that “The most fundamental criterion for a scientific method is that the data must, in principle, be able to reject the model. Hardly any [phylogenetic] tree-reconstruction methods meet this simple requirement.” The ability to reject models is of such great importance because the results of all phylogenetic analyses depend on their underlying models—to have confidence in the inferences, it is necessary to have confidence in the models. In this paper, a test statistics suggested by Cox is employed to test the adequacy of some statistical models of DNA sequence evolution used in the phylogenetic inference method introduced by Felsentein. Monte Carlo simulations are used to assess significance levels. The resulting statistical tests provide an objective and very general assessment of all the components of a DNA substitution model; more specific versions of the test are devised to test individual components of a model. In all cases, the new analyses have the additional advantage that values of phylogenetic parameters do not have to be assumed in order to perform the tests.
Nature | 2011
Kerstin Lindblad-Toh; Manuel Garber; Or Zuk; Michael F. Lin; Brian J. Parker; Stefan Washietl; Pouya Kheradpour; Jason Ernst; Gregory Jordan; Evan Mauceli; Lucas D. Ward; Craig B. Lowe; Alisha K. Holloway; Michele Clamp; Sante Gnerre; Jessica Alföldi; Kathryn Beal; Jean Chang; Hiram Clawson; James Cuff; Federica Di Palma; Stephen Fitzgerald; Paul Flicek; Mitchell Guttman; Melissa J. Hubisz; David B. Jaffe; Irwin Jungreis; W. James Kent; Dennis Kostka; Marcia Lara
The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.
Science | 2008
Ari Löytynoja; Nick Goldman
Genetic sequence alignment is the basis of many evolutionary and comparative studies, and errors in alignments lead to errors in the interpretation of evolutionary information in genomes. Traditional multiple sequence alignment methods disregard the phylogenetic implications of gap patterns that they create and infer systematically biased alignments with excess deletions and substitutions, too few insertions, and implausible insertion-deletion–event histories. We present a method that prevents these systematic errors by recognizing insertions and deletions as distinct evolutionary events. We show theoretically and practically that this improves the quality of sequence alignments and downstream analyses over a wide range of realistic alignment problems. These results suggest that insertions and sequence turnover are more common than is currently thought and challenge the conventional picture of sequence evolution and mechanisms of functional and structural changes.
Nature | 2012
Aylwyn Scally; Julien Y. Dutheil; LaDeana W. Hillier; Gregory Jordan; Ian Goodhead; Javier Herrero; Asger Hobolth; Tuuli Lappalainen; Thomas Mailund; Tomas Marques-Bonet; Shane McCarthy; Stephen H. Montgomery; Petra C. Schwalie; Y. Amy Tang; Michelle C. Ward; Yali Xue; Bryndis Yngvadottir; Can Alkan; Lars Nørvang Andersen; Qasim Ayub; Edward V. Ball; Kathryn Beal; Brenda J. Bradley; Yuan Chen; Chris Clee; Stephen Fitzgerald; Tina Graves; Yong Gu; Paul Heath; Andreas Heger
Gorillas are humans’ closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human–chimpanzee and human–chimpanzee–gorilla speciation events at approximately 6 and 10 million years ago. In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution.
Trends in Genetics | 2001
Simon Whelan; Pietro Liò; Nick Goldman
As the amount of molecular sequence data in the public domain grows, so does the range of biological topics that it influences through evolutionary considerations. In recent years, a number of developments have enabled molecular phylogenetic methodology to keep pace. Likelihood-based inferential techniques, although controversial in the past, lie at the heart of these new methods and are producing the promised advances in the understanding of sequence evolution. They allow both a wide variety of phylogenetic inferences from sequence data and robust statistical assessment of all results. It cannot remain acceptable to use outdated data analysis techniques when superior alternatives exist. Here, we discuss the most important and exciting methods currently available to the molecular phylogeneticist.
Nature Methods | 2013
Pär G. Engström; Tamara Steijger; Botond Sipos; Gregory R. Grant; André Kahles; Gunnar Rätsch; Nick Goldman; Tim Hubbard; Jennifer Harrow; Roderic Guigó; Paul Bertone
High-throughput RNA sequencing is an increasingly accessible method for studying gene structure and activity on a genome-wide scale. A critical step in RNA-seq data analysis is the alignment of partial transcript reads to a reference genome sequence. To assess the performance of current mapping software, we invited developers of RNA-seq aligners to process four large human and mouse RNA-seq data sets. In total, we compared 26 mapping protocols based on 11 programs and pipelines and found major performance differences between methods on numerous benchmarks, including alignment yield, basewise accuracy, mismatch and gap placement, exon junction discovery and suitability of alignments for transcript reconstruction. We observed concordant results on real and simulated RNA-seq data, confirming the relevance of the metrics employed. Future developments in RNA-seq alignment methods would benefit from improved placement of multimapped reads, balanced utilization of existing gene annotation and a reduced false discovery rate for splice junctions.
Nature | 2013
Nick Goldman; Paul Bertone; Siyuan Chen; Christophe Dessimoz; Emily LeProust; Botond Sipos; Ewan Birney
Digital production, transmission and storage have revolutionized how we access and use information but have also made archiving an increasingly complex task that requires active, continuing maintenance of digital media. This challenge has focused some interest on DNA as an attractive target for information storage because of its capacity for high-density information encoding, longevity under easily achieved conditions and proven track record as an information bearer. Previous DNA-based information storage approaches have encoded only trivial amounts of information or were not amenable to scaling-up, and used no robust error-correction and lacked examination of their cost-efficiency for large-scale information archival. Here we describe a scalable method that can reliably store more information than has been handled before. We encoded computer files totalling 739 kilobytes of hard-disk storage and with an estimated Shannon information of 5.2 × 106 bits into a DNA code, synthesized this DNA, sequenced it and reconstructed the original files with 100% accuracy. Theoretical analysis indicates that our DNA-based storage scheme could be scaled far beyond current global information volumes and offers a realistic technology for large-scale, long-term and infrequently accessed digital archiving. In fact, current trends in technological advances are reducing DNA synthesis costs at a pace that should make our scheme cost-effective for sub-50-year archiving within a decade.
BMC Bioinformatics | 2010
Ari Löytynoja; Nick Goldman
BackgroundPhylogeny-aware progressive alignment has been found to perform well in phylogenetic alignment benchmarks and to produce superior alignments for the inference of selection on codon sequences. Its implementation in the PRANK alignment program package also allows modelling of complex evolutionary processes and inference of posterior probabilities for sequence sites evolving under each distinct scenario, either simultaneously with the alignment of sequences or as a post-processing step for an existing alignment. This has led to software with many advanced features, and users may find it difficult to generate optimal alignments, visualise the full information in their alignment results, or post-process these results, e.g. by objectively selecting subsets of alignment sites.ResultsWe have created a web server called webPRANK that provides an easy-to-use interface to the PRANK phylogeny-aware alignment algorithm. The webPRANK server supports the alignment of DNA, protein and codon sequences as well as protein-translated alignment of cDNAs, and includes built-in structure models for the alignment of genomic sequences. The resulting alignments can be exported in various formats widely used in evolutionary sequence analyses. The webPRANK server also includes a powerful web-based alignment browser for the visualisation and post-processing of the results in the context of a cladogram relating the sequences, allowing (e.g.) removal of alignment columns with low posterior reliability. In addition to de novo alignments, webPRANK can be used for the inference of ancestral sequences with phylogenetically realistic gap patterns, and for the annotation and post-processing of existing alignments. The webPRANK server is freely available on the web at http://tinyurl.com/webprank .ConclusionsThe webPRANK server incorporates phylogeny-aware multiple sequence alignment, visualisation and post-processing in an easy-to-use web interface. It widens the user base of phylogeny-aware multiple sequence alignment and allows the performance of all alignment-related activity for small sequence analysis projects using only a standard web browser.
Proceedings of the National Academy of Sciences of the United States of America | 2012
Kate Schroder; Katharine M. Irvine; Martin S. Taylor; Nilesh J. Bokil; Kim-Anh Lê Cao; Kelly-Anne Masterman; Larisa I. Labzin; Colin A. Semple; Ronan Kapetanovic; Lynsey Fairbairn; Altuna Akalin; Geoffrey J. Faulkner; John Kenneth Baillie; Milena Gongora; Carsten O. Daub; Hideya Kawaji; Geoffrey J. McLachlan; Nick Goldman; Sean M. Grimmond; Piero Carninci; Harukazu Suzuki; Yoshihide Hayashizaki; Boris Lenhard; David A. Hume; Matthew J. Sweet
Evolutionary change in gene expression is generally considered to be a major driver of phenotypic differences between species. We investigated innate immune diversification by analyzing interspecies differences in the transcriptional responses of primary human and mouse macrophages to the Toll-like receptor (TLR)–4 agonist lipopolysaccharide (LPS). By using a custom platform permitting cross-species interrogation coupled with deep sequencing of mRNA 5′ ends, we identified extensive divergence in LPS-regulated orthologous gene expression between humans and mice (24% of orthologues were identified as “divergently regulated”). We further demonstrate concordant regulation of human-specific LPS target genes in primary pig macrophages. Divergently regulated orthologues were enriched for genes encoding cellular “inputs” such as cell surface receptors (e.g., TLR6, IL-7Rα) and functional “outputs” such as inflammatory cytokines/chemokines (e.g., CCL20, CXCL13). Conversely, intracellular signaling components linking inputs to outputs were typically concordantly regulated. Functional consequences of divergent gene regulation were confirmed by showing LPS pretreatment boosts subsequent TLR6 responses in mouse but not human macrophages, in keeping with mouse-specific TLR6 induction. Divergently regulated genes were associated with a large dynamic range of gene expression, and specific promoter architectural features (TATA box enrichment, CpG island depletion). Surprisingly, regulatory divergence was also associated with enhanced interspecies promoter conservation. Thus, the genes controlled by complex, highly conserved promoters that facilitate dynamic regulation are also the most susceptible to evolutionary change.