Gina M. Cannarozzi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gina M. Cannarozzi is active.

Explore More

Publication

Featured researches published by Gina M. Cannarozzi.

Cell | 2010

A Role for Codon Order in Translation Dynamics

Gina M. Cannarozzi; Nicol N. Schraudolph; Mahamadou Faty; Peter von Rohr; Markus T. Friberg; Alexander Roth; Pedro Gonnet; Gaston H. Gonnet; Yves Barral

The genetic code is degenerate. Each amino acid is encoded by up to six synonymous codons; the choice between these codons influences gene expression. Here, we show that in coding sequences, once a particular codon has been used, subsequent occurrences of the same amino acid do not use codons randomly, but favor codons that use the same tRNA. The effect is pronounced in rapidly induced genes, involves both frequent and rare codons and diminishes only slowly as a function of the distance between subsequent synonymous codons. Furthermore, we found that in S. cerevisiae codon correlation accelerates translation relative to the translation of synonymous yet anticorrelated sequences. The data suggest that tRNA diffusion away from the ribosome is slower than translation, and that some tRNA channeling takes place at the ribosome. They also establish that the dynamics of translation leave a significant signature at the level of the genome.

Nucleic Acids Research | 2012

FastML: a web server for probabilistic reconstruction of ancestral sequences

Haim Ashkenazy; Osnat Penn; Adi Doron-Faigenboim; Ofir Cohen; Gina M. Cannarozzi; Oren Zomer; Tal Pupko

Ancestral sequence reconstruction is essential to a variety of evolutionary studies. Here, we present the FastML web server, a user-friendly tool for the reconstruction of ancestral sequences. FastML implements various novel features that differentiate it from existing tools: (i) FastML uses an indel-coding method, in which each gap, possibly spanning multiples sites, is coded as binary data. FastML then reconstructs ancestral indel states assuming a continuous time Markov process. FastML provides the most likely ancestral sequences, integrating both indels and characters; (ii) FastML accounts for uncertainty in ancestral states: it provides not only the posterior probabilities for each character and indel at each sequence position, but also a sample of ancestral sequences from this posterior distribution, and a list of the k-most likely ancestral sequences; (iii) FastML implements a large array of evolutionary models, which makes it generic and applicable for nucleotide, protein and codon sequences; and (iv) a graphical representation of the results is provided, including, for example, a graphical logo of the inferred ancestral sequences. The utility of FastML is demonstrated by reconstructing ancestral sequences of the Env protein from various HIV-1 subtypes. FastML is freely available for all academic users and is available online at http://fastml.tau.ac.il/.

BMC Bioinformatics | 2005

Empirical codon substitution matrix

Adrian Schneider; Gina M. Cannarozzi; Gaston H. Gonnet

BackgroundCodon substitution probabilities are used in many types of molecular evolution studies such as determining Ka/Ks ratios, creating ancestral DNA sequences or aligning coding DNA. Until the recent dramatic increase in genomic data enabled construction of empirical matrices, researchers relied on parameterized models of codon evolution. Here we present the first empirical codon substitution matrix entirely built from alignments of coding sequences from vertebrate DNA and thus provide an alternative to parameterized models of codon evolution.ResultsA set of 17,502 alignments of orthologous sequences from five vertebrate genomes yielded 8.3 million aligned codons from which the number of substitutions between codons were counted. From this data, both a probability matrix and a matrix of similarity scores were computed. They are 64 × 64 matrices describing the substitutions between all codons. Substitutions from sense codons to stop codons are not considered, resulting in block diagonal matrices consisting of 61 × 61 entries for the sense codons and 3 × 3 entries for the stop codons.ConclusionThe amount of genomic data currently available allowed for the construction of an empirical codon substitution matrix. However, more sequence data is still needed to construct matrices from different subsets of DNA, specific to kingdoms, evolutionary distance or different amount of synonymous change. Codon mutation matrices have advantages for alignments up to medium evolutionary distances and for usages that require DNA such as ancestral reconstruction of DNA sequences and the calculation of Ka/Ks ratios.

PLOS Computational Biology | 2005

A Phylogenomic Study of Human, Dog, and Mouse

Gina M. Cannarozzi; Adrian Schneider; Gaston H. Gonnet

In recent years the phylogenetic relationship of mammalian orders has been addressed in a number of molecular studies. These analyses have frequently yielded inconsistent results with respect to some basal ordinal relationships. For example, the relative placement of primates, rodents, and carnivores has differed in various studies. Here, we attempt to resolve this phylogenetic problem by using data from completely sequenced nuclear genomes to base the analyses on the largest possible amount of data. To minimize the risk of reconstruction artifacts, the trees were reconstructed under different criteria—distance, parsimony, and likelihood. For the distance trees, distance metrics that measure independent phenomena (amino acid replacement, synonymous substitution, and gene reordering) were used, as it is highly improbable that all of the trees would be affected the same way by any reconstruction artifact. In contradiction to the currently favored classification, our results based on full-genome analysis of the phylogenetic relationship between human, dog, and mouse yielded overwhelming support for a primate–carnivore clade with the exclusion of rodents.

research in computational molecular biology | 2005

OMA, a comprehensive, automated project for the identification of orthologs from complete genome data: introduction and first achievements

Christophe Dessimoz; Gina M. Cannarozzi; Manuel Gil; Daniel Margadant; Alexander Roth; Adrian Schneider; Gaston H. Gonnet

The OMA project is a large-scale effort to identify groups of orthologs from complete genome data, currently 150 species. The algorithm relies solely on protein sequence information and does not require any human supervision. It has several original features, in particular a verification step that detects paralogs and prevents them from being clustered together. Consistency checks and verification are performed throughout the process. The resulting groups, whenever a comparison could be made, are highly consistent both with EC assignments, and with assignments from the manually curated database HAMAP. A highly accurate set of orthologous sequences constitutes the basis for several other investigations, including phylogenetic analysis and protein classification.

Archive | 2012

Codon evolution : mechanisms and models

Gina M. Cannarozzi; Adrian Schneider

The unifying principle in almost all of bioinformatics is sequence analysis: no matter if you are predicting the structure of proteins, analyzing the genetic variation in a population, or deciphering the evolutionary history of your favorite gene, your analysis hinges on looking at biological sequences and how they change over time, between species, or from gene to gene. Biological sequence analysis is the cornerstone. It is no surprise, therefore, that a lot of effort is going into improving our tools for analyzing biological sequences in order to get as much and as accurate information as possible from our data. Especially today when large-scale sequencing projects are becoming commonplace in research groups all around the world, resulting in an unprecedented increase in available biological sequences, the need for proper computational tools for sequence analysis is greater than ever. Comparative genomics and evolutionary studies can now include a huge number of species and genes, and obviously we want to get the most correct information about evolutionary relationships out of the available data. When looking at coding sequences we have access to information on both the DNA and protein level, and these signals can be combined by including the codon usage in protein coding genes. This can greatly improve your analysis as it is shown in numerous examples in the book Codon Evolution: Mechanisms and Models. Here, a selection of outstanding researchers present a thorough overview of this field covering both the theoretical underpinnings and practical applications. Understanding the evolution of codon usage over time, as well as the differences from species to species, we can get a much more complete understanding of sequence evolution. This has great impact on how to perform sequence analysis and, thus, on the field of bioinformatics as a whole. The first part of the book, covering 12 chapters, describes different models of codon evolution. In chapter 1, A. Schneider and G.M. Cannarozzi introduce the subject matter and present notation and definitions used throughout the book. The chapter also briefly covers various widely used models, such as Markov models and maximum likelihood. This short introduction makes the book somewhat self-contained, although some background knowledge is useful to appreciate the details. Chapter 2 by M. Anisimova describes parametric models of codon evolution and gives a thorough and well written introduction and overview of the field. This chapter covers a lot of ground from simply modeling codon frequencies through tests for selection to a discussion on modeling site dependencies. I enjoyed this chapter and found the reviewlike nature of it very useful. The next chapter by A. Schneider and G.M. Cannarozzi takes a different approach by describing empirical models of codon usage based on substitution matrices in the spirit of BLOSUM and PAM. As has been the case in many other fields of bioinformatics, Bayesian statistics is also useful in the realm of codons, and in chapter 4 N. Rodrigue and N. Lartillot cover Monte Carlo approaches to codon substitution models. Using Markov chain Monte Carlo and simulated annealing under the well-known Metropolis-Hastings kernel (which has been used with success in other fields including structure prediction of RNA and protein, multiple alignment, and phylogeny), the authors present a framework for complex models where it would be impossible to numerically evaluate the likelihood function. It is well-known that evolutionary rates (e.g. synonymous to non-synonymous substitutions) can vary between sites. Chapter 5 by H. Gu, K.S. Dunn and J.P. Bielawski presents the use of likelihood-based clustering to partition sites into distinct groups, each governed by a specific model, and they illustrate the utility of this approach on a large set of transmembrane proteins. In the following chapter, M. Anisimova and D.A. Liberles discuss how to detect natural selection in a statistical framework, and they give a very good introduction to the field. This is an interesting chapter, and especially the section on some of the common mistakes made in the field could become a useful resource. This ties in well with chapter 8, where G.A. Huttley and V.B. Yap show how important the assumptions in any given model are when estimating selection. One of the most interesting chapters to me was chapter 7 by J.L. Thorne et al. Here, they review methods for comparing variation in protein coding genes between species within the realm of population genetics. In population genetics, most of the focus is on variation within a population but by taking a broader look and comparing inter-specific sequences, it becomes possible to look at mutations that became fixed long ago and which would not be visible within a single species. After a very short chapter by M. Arenas and D. Posada on how to simulate the evolution of coding sequences, chapter 10 revisits the fact that we gain much more information by looking at codons rather than amino acids when analysing coding sequences. However, as S.A. Brenner points out, this leads to models that are hard to fully parameterize and he therefore discusses how to circumvent this problem by reducing the number of free parameters by grouping codons based on, for example, the observation that some codons are converted to other synonymous codons by purine to purine mutations. This discussion is important for accurately dating divergence times. This leads nicely into the next chapter by B.S.W. Chang et al. which is a review of ancestral sequence reconstruction methods and models of divergence between clades. To finish off the first part of the book, chapter 12 by G. Aguileta and T. Giraud reviews studies on fungal genomes using codon models to investigate various aspects of their evolutionary history. This ties in well with the aforementioned increase in available sequence data due to next-generation sequencing technology. The second and shorter part of the book describes different aspects of codon usage biases which is known to vary across species and between genes. The first chapter by A. Roth, M. Anisimova and G.M. Cannarozzi sets the stage by presenting an in-depth review of most (if not all) the various measures of codon bias that have been proposed to date. This is followed by a chapter by N.D. Rubinstein and T. Pupko who discuss the conservation of synonymous mutations, i.e. changes in codons that do not alter the encoded protein. The authors point to various reasons why synonymous mutations can affect fitness through e.g. translation efficiency, mRNA structure, or splicing signals. The same theme is covered in chapter 15 where F. Supek and T. Smuc discuss biases in codon usage and how to quantify these differences. They present both supervised and unsupervised methods for analyzing codon usage and show an application to genome data from archaea and bacteria. The following chapter by K. Zeng takes a population genetics view on codon usage bias by looking at synonymous polymorphisms within a group. This is an interesting review of two models where the author compares and contrasts the two. The last chapter in the book by M.d.C. Santos and M.A.S. Santos discusses the various deviations from the standard genetic code and how these changes may occur naturally. This review is an interesting read and presents some interesting avenues for future research reaching back to the very root of the evolutionary tree connecting all life. All in all, Trends in Evolutionary Biology 2012; volume 4:e8

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2007

SynPAM—A Distance Measure Based on Synonymous Codon Substitutions

Adrian Schneider; Gaston H. Gonnet; Gina M. Cannarozzi

Measuring evolutionary distances between DNA or protein sequences forms the basis of many applications in computational biology and evolutionary studies. Of particular interest are distances based on synonymous substitutions since these substitutions are considered to be under very little selection pressure and therefore assumed to accumulate in an almost clock-like manner. SynPAM, the method presented here, allows the estimation of distances between coding DNA sequences based on synonymous codon substitutions. The problem of estimating an accurate distance from the observed substitution pattern is solved by maximum likelihood with empirical codon substitution matrices employed for the underlying Markov model. Comparisons with established measures of synonymous distance indicate that SynPAM has less variance and yields useful results over a longer time range.

Bioinformatics | 2000

A cross-comparison of a large dataset of genes

Gina M. Cannarozzi; Michael Hallett; J. Norberg; Xianghong Zhou

SUMMARY We make available a large cross-comparison for 16 of the completely sequenced genomes and additional eukaryotic genes. The alignments were performed at the protein level using liberal similarity bounds in order to capture as many significant alignments as possible. This dataset will be updated as new genomes become available.

international conference on computational science | 2006

Synonymous codon substitution matrices

Adrian Schneider; Gaston H. Gonnet; Gina M. Cannarozzi

Observing differences between DNA or protein sequences and estimating the true amount of substitutions from them is a prominent problem in molecular evolution as many analyses are based on distance measures between biological sequences. Since the relationship between the observed and the actual amount of mutations is very complex, more than four decades of research have been spent to improve molecular distance measures. In this article we present a method called SynPAM which can be used to estimate the amount of synonymous change between sequences of coding DNA. The method is novel in that it is based on an empirical model of codon evolution and that it uses a maximum-likelihood formalism to measure synonymous change in terms of codon substitutions, while reducing the need for assumptions about DNA evolution to an absolute minimum. We compared the SynPAM method with two established methods for measuring synonymous sequence divergence. Our results suggest that this new method not only shows less variance, but is also able to capture weaker phylogenetic signals than the other methods.

Technical report / Swiss Federal Institute of Technology Zurich, Department of Computer Science | 2007

Recognizing proteins by weight of their digested parts

Gaston H. Gonnet; Gina M. Cannarozzi

Traditionally, proteins were identified by de novo sequencing, notably via the Edman degradation. As the protein databases grew, correlating experimental information with the information in sequence databases provided a faster means of identification. Mass spectrometry provides a set of weights of protein fragments which can be compared to existing sequence databases. This procedure, called mass mapping, is a very effective means of identifying proteins. This method described here is not effective to find the composition of an unknown protein (a separate field of study), but it is effective in locating an unknown sample if its sequence is recorded in a protein database. For a review of the use of mass spectrometry in proteomics, see Chem. Rev. 2001, 101, 269-295 by Aebersold and Goodlett. One of the ways of breaking a protein into smaller pieces according to a certain pattern is by using enzymes which digest the protein. For example, trypsin breaks a protein after every Arginine (R) or after every Lysine (K) not followed by a Proline (P). It is not very difficult, given the rules, to write a function which will do the theoretical digestion of a sequence. The function for trypsin is:

Explore More