Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Adrian Schneider is active.

Publication


Featured researches published by Adrian Schneider.


Bioinformatics | 2007

OMA Browser—Exploring orthologous relations across 352 complete genomes

Adrian Schneider; Christophe Dessimoz; Gaston H. Gonnet

MOTIVATION Inference of the evolutionary relation between proteins, in particular the identification of orthologs, is a central problem in comparative genomics. Several large-scale efforts with various methodologies and scope tackle this problem, including OMA (the Orthologous MAtrix project). RESULTS Based on the results of the OMA project, we introduce here the OMA Browser, a web-based tool allowing the exploration of orthologous relations over 352 complete genomes. Orthologs can be viewed as groups across species, but also at the level of sequence pairs, allowing the distinction among one-to-one, one-to-many and many-to-many orthologs. AVAILABILITY http://omabrowser.org.


Genome Biology and Evolution | 2009

Estimates of Positive Darwinian Selection Are Inflated by Errors in Sequencing, Annotation, and Alignment

Adrian Schneider; Alexander Souvorov; Niv Sabath; Giddy Landan; Gaston H. Gonnet; Dan Graur

Published estimates of the proportion of positively selected genes (PSGs) in human vary over three orders of magnitude. In mammals, estimates of the proportion of PSGs cover an even wider range of values. We used 2,980 orthologous protein-coding genes from human, chimpanzee, macaque, dog, cow, rat, and mouse as well as an established phylogenetic topology to infer the fraction of PSGs in all seven terminal branches. The inferred fraction of PSGs ranged from 0.9% in human through 17.5% in macaque to 23.3% in dog. We found three factors that influence the fraction of genes that exhibit telltale signs of positive selection: the quality of the sequence, the degree of misannotation, and ambiguities in the multiple sequence alignment. The inferred fraction of PSGs in sequences that are deficient in all three criteria of coverage, annotation, and alignment is 7.2 times higher than that in genes with high trace sequencing coverage, “known” annotation status, and perfect alignment scores. We conclude that some estimates on the prevalence of positive Darwinian selection in the literature may be inflated and should be treated with caution.


BMC Bioinformatics | 2005

Empirical codon substitution matrix

Adrian Schneider; Gina M. Cannarozzi; Gaston H. Gonnet

BackgroundCodon substitution probabilities are used in many types of molecular evolution studies such as determining Ka/Ks ratios, creating ancestral DNA sequences or aligning coding DNA. Until the recent dramatic increase in genomic data enabled construction of empirical matrices, researchers relied on parameterized models of codon evolution. Here we present the first empirical codon substitution matrix entirely built from alignments of coding sequences from vertebrate DNA and thus provide an alternative to parameterized models of codon evolution.ResultsA set of 17,502 alignments of orthologous sequences from five vertebrate genomes yielded 8.3 million aligned codons from which the number of substitutions between codons were counted. From this data, both a probability matrix and a matrix of similarity scores were computed. They are 64 × 64 matrices describing the substitutions between all codons. Substitutions from sense codons to stop codons are not considered, resulting in block diagonal matrices consisting of 61 × 61 entries for the sense codons and 3 × 3 entries for the stop codons.ConclusionThe amount of genomic data currently available allowed for the construction of an empirical codon substitution matrix. However, more sequence data is still needed to construct matrices from different subsets of DNA, specific to kingdoms, evolutionary distance or different amount of synonymous change. Codon mutation matrices have advantages for alignments up to medium evolutionary distances and for usages that require DNA such as ancestral reconstruction of DNA sequences and the calculation of Ka/Ks ratios.


PLOS Computational Biology | 2005

A Phylogenomic Study of Human, Dog, and Mouse

Gina M. Cannarozzi; Adrian Schneider; Gaston H. Gonnet

In recent years the phylogenetic relationship of mammalian orders has been addressed in a number of molecular studies. These analyses have frequently yielded inconsistent results with respect to some basal ordinal relationships. For example, the relative placement of primates, rodents, and carnivores has differed in various studies. Here, we attempt to resolve this phylogenetic problem by using data from completely sequenced nuclear genomes to base the analyses on the largest possible amount of data. To minimize the risk of reconstruction artifacts, the trees were reconstructed under different criteria—distance, parsimony, and likelihood. For the distance trees, distance metrics that measure independent phenomena (amino acid replacement, synonymous substitution, and gene reordering) were used, as it is highly improbable that all of the trees would be affected the same way by any reconstruction artifact. In contradiction to the currently favored classification, our results based on full-genome analysis of the phylogenetic relationship between human, dog, and mouse yielded overwhelming support for a primate–carnivore clade with the exclusion of rodents.


research in computational molecular biology | 2005

OMA, a comprehensive, automated project for the identification of orthologs from complete genome data: introduction and first achievements

Christophe Dessimoz; Gina M. Cannarozzi; Manuel Gil; Daniel Margadant; Alexander Roth; Adrian Schneider; Gaston H. Gonnet

The OMA project is a large-scale effort to identify groups of orthologs from complete genome data, currently 150 species. The algorithm relies solely on protein sequence information and does not require any human supervision. It has several original features, in particular a verification step that detects paralogs and prevents them from being clustered together. Consistency checks and verification are performed throughout the process. The resulting groups, whenever a comparison could be made, are highly consistent both with EC assignments, and with assignments from the manually curated database HAMAP. A highly accurate set of orthologous sequences constitutes the basis for several other investigations, including phylogenetic analysis and protein classification.


Archive | 2012

Codon evolution : mechanisms and models

Gina M. Cannarozzi; Adrian Schneider

The unifying principle in almost all of bioinformatics is sequence analysis: no matter if you are predicting the structure of proteins, analyzing the genetic variation in a population, or deciphering the evolutionary history of your favorite gene, your analysis hinges on looking at biological sequences and how they change over time, between species, or from gene to gene. Biological sequence analysis is the cornerstone. It is no surprise, therefore, that a lot of effort is going into improving our tools for analyzing biological sequences in order to get as much and as accurate information as possible from our data. Especially today when large-scale sequencing projects are becoming commonplace in research groups all around the world, resulting in an unprecedented increase in available biological sequences, the need for proper computational tools for sequence analysis is greater than ever. Comparative genomics and evolutionary studies can now include a huge number of species and genes, and obviously we want to get the most correct information about evolutionary relationships out of the available data. When looking at coding sequences we have access to information on both the DNA and protein level, and these signals can be combined by including the codon usage in protein coding genes. This can greatly improve your analysis as it is shown in numerous examples in the book Codon Evolution: Mechanisms and Models. Here, a selection of outstanding researchers present a thorough overview of this field covering both the theoretical underpinnings and practical applications. Understanding the evolution of codon usage over time, as well as the differences from species to species, we can get a much more complete understanding of sequence evolution. This has great impact on how to perform sequence analysis and, thus, on the field of bioinformatics as a whole. The first part of the book, covering 12 chapters, describes different models of codon evolution. In chapter 1, A. Schneider and G.M. Cannarozzi introduce the subject matter and present notation and definitions used throughout the book. The chapter also briefly covers various widely used models, such as Markov models and maximum likelihood. This short introduction makes the book somewhat self-contained, although some background knowledge is useful to appreciate the details. Chapter 2 by M. Anisimova describes parametric models of codon evolution and gives a thorough and well written introduction and overview of the field. This chapter covers a lot of ground from simply modeling codon frequencies through tests for selection to a discussion on modeling site dependencies. I enjoyed this chapter and found the reviewlike nature of it very useful. The next chapter by A. Schneider and G.M. Cannarozzi takes a different approach by describing empirical models of codon usage based on substitution matrices in the spirit of BLOSUM and PAM. As has been the case in many other fields of bioinformatics, Bayesian statistics is also useful in the realm of codons, and in chapter 4 N. Rodrigue and N. Lartillot cover Monte Carlo approaches to codon substitution models. Using Markov chain Monte Carlo and simulated annealing under the well-known Metropolis-Hastings kernel (which has been used with success in other fields including structure prediction of RNA and protein, multiple alignment, and phylogeny), the authors present a framework for complex models where it would be impossible to numerically evaluate the likelihood function. It is well-known that evolutionary rates (e.g. synonymous to non-synonymous substitutions) can vary between sites. Chapter 5 by H. Gu, K.S. Dunn and J.P. Bielawski presents the use of likelihood-based clustering to partition sites into distinct groups, each governed by a specific model, and they illustrate the utility of this approach on a large set of transmembrane proteins. In the following chapter, M. Anisimova and D.A. Liberles discuss how to detect natural selection in a statistical framework, and they give a very good introduction to the field. This is an interesting chapter, and especially the section on some of the common mistakes made in the field could become a useful resource. This ties in well with chapter 8, where G.A. Huttley and V.B. Yap show how important the assumptions in any given model are when estimating selection. One of the most interesting chapters to me was chapter 7 by J.L. Thorne et al. Here, they review methods for comparing variation in protein coding genes between species within the realm of population genetics. In population genetics, most of the focus is on variation within a population but by taking a broader look and comparing inter-specific sequences, it becomes possible to look at mutations that became fixed long ago and which would not be visible within a single species. After a very short chapter by M. Arenas and D. Posada on how to simulate the evolution of coding sequences, chapter 10 revisits the fact that we gain much more information by looking at codons rather than amino acids when analysing coding sequences. However, as S.A. Brenner points out, this leads to models that are hard to fully parameterize and he therefore discusses how to circumvent this problem by reducing the number of free parameters by grouping codons based on, for example, the observation that some codons are converted to other synonymous codons by purine to purine mutations. This discussion is important for accurately dating divergence times. This leads nicely into the next chapter by B.S.W. Chang et al. which is a review of ancestral sequence reconstruction methods and models of divergence between clades. To finish off the first part of the book, chapter 12 by G. Aguileta and T. Giraud reviews studies on fungal genomes using codon models to investigate various aspects of their evolutionary history. This ties in well with the aforementioned increase in available sequence data due to next-generation sequencing technology. The second and shorter part of the book describes different aspects of codon usage biases which is known to vary across species and between genes. The first chapter by A. Roth, M. Anisimova and G.M. Cannarozzi sets the stage by presenting an in-depth review of most (if not all) the various measures of codon bias that have been proposed to date. This is followed by a chapter by N.D. Rubinstein and T. Pupko who discuss the conservation of synonymous mutations, i.e. changes in codons that do not alter the encoded protein. The authors point to various reasons why synonymous mutations can affect fitness through e.g. translation efficiency, mRNA structure, or splicing signals. The same theme is covered in chapter 15 where F. Supek and T. Smuc discuss biases in codon usage and how to quantify these differences. They present both supervised and unsupervised methods for analyzing codon usage and show an application to genome data from archaea and bacteria. The following chapter by K. Zeng takes a population genetics view on codon usage bias by looking at synonymous polymorphisms within a group. This is an interesting review of two models where the author compares and contrasts the two. The last chapter in the book by M.d.C. Santos and M.A.S. Santos discusses the various deviations from the standard genetic code and how these changes may occur naturally. This review is an interesting read and presents some interesting avenues for future research reaching back to the very root of the evolutionary tree connecting all life. All in all, Trends in Evolutionary Biology 2012; volume 4:e8


Molecular Biology and Evolution | 2009

Support Patterns from Different Outgroups Provide a Strong Phylogenetic Signal

Adrian Schneider; Gina M. Cannarozzi

It is known that the accuracy of phylogenetic reconstruction decreases when more distant outgroups are used. We quantify this phenomenon with a novel scoring method, the outgroup score pOG. This score expresses if the support for a particular branch of a tree decreases with increasingly distant outgroups. Large-scale simulations confirmed that the outgroup support follows this expectation and that the pOG score captures this pattern. The score often identifies the correct topology even when the primary reconstruction methods fail, particularly in the presence of model violations. In simulations of problematic phylogenetic scenarios such as rate variation among lineages (which can lead to long-branch attraction artifacts) and quartet-based reconstruction, the pOG analysis outperformed the primary reconstruction methods. Because the pOG method does not make any assumptions about the evolutionary model (besides the decreasing support from increasingly distant outgroups), it can detect cases of violations not treated by a specific model or too strong to be fully corrected. When used as an optimization criterion in the construction of a tree of 23 mammals, the outgroup signal confirmed many well-accepted mammalian orders and superorders. It supports Atlantogenata, a clade of Afrotheria and Xenarthra, and suggests an Artiodactyla-Chiroptera clade.


IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2007

SynPAM—A Distance Measure Based on Synonymous Codon Substitutions

Adrian Schneider; Gaston H. Gonnet; Gina M. Cannarozzi

Measuring evolutionary distances between DNA or protein sequences forms the basis of many applications in computational biology and evolutionary studies. Of particular interest are distances based on synonymous substitutions since these substitutions are considered to be under very little selection pressure and therefore assumed to accumulate in an almost clock-like manner. SynPAM, the method presented here, allows the estimation of distances between coding DNA sequences based on synonymous codon substitutions. The problem of estimating an accurate distance from the observed substitution pattern is solved by maximum likelihood with empirical codon substitution matrices employed for the underlying Markov model. Comparisons with established measures of synonymous distance indicate that SynPAM has less variance and yields useful results over a longer time range.


BMC Bioinformatics | 2006

Fast estimation of the difference between two PAM/JTT evolutionary distances in triplets of homologous sequences

Christophe Dessimoz; Manuel Gil; Adrian Schneider; Gaston H. Gonnet

BackgroundThe estimation of the difference between two evolutionary distances within a triplet of homologs is a common operation that is used for example to determine which of two sequences is closer to a third one. The most accurate method is currently maximum likelihood over the entire triplet. However, this approach is relatively time consuming.ResultsWe show that an alternative estimator, based on pairwise estimates and therefore much faster to compute, has almost the same statistical power as the maximum likelihood estimator. We also provide a numerical approximation for its variance, which could otherwise only be estimated through an expensive re-sampling approach such as bootstrapping. An extensive simulation demonstrates that the approximation delivers precise confidence intervals. To illustrate the possible applications of these results, we show how they improve the detection of asymmetric evolution, and the identification of the closest relative to a given sequence in a group of homologs.ConclusionThe results presented in this paper constitute a basis for large-scale protein cross-comparisons of pairwise evolutionary distances.


international conference on computational science | 2006

Synonymous codon substitution matrices

Adrian Schneider; Gaston H. Gonnet; Gina M. Cannarozzi

Observing differences between DNA or protein sequences and estimating the true amount of substitutions from them is a prominent problem in molecular evolution as many analyses are based on distance measures between biological sequences. Since the relationship between the observed and the actual amount of mutations is very complex, more than four decades of research have been spent to improve molecular distance measures. In this article we present a method called SynPAM which can be used to estimate the amount of synonymous change between sequences of coding DNA. The method is novel in that it is based on an empirical model of codon evolution and that it uses a maximum-likelihood formalism to measure synonymous change in terms of codon substitutions, while reducing the need for assumptions about DNA evolution to an absolute minimum. We compared the SynPAM method with two established methods for measuring synonymous sequence divergence. Our results suggest that this new method not only shows less variance, but is also able to capture weaker phylogenetic signals than the other methods.

Collaboration


Dive into the Adrian Schneider's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alexander Souvorov

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Dan Graur

University of Houston

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge