Guylaine Poisson
University of Hawaii at Manoa
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Guylaine Poisson.
Genomics, Proteomics & Bioinformatics | 2007
Guylaine Poisson; Cedric Chauve; Xin Chen; Anne Bergeron
A glycosylphosphatidylinositol (GPI) anchor is a common but complex C-terminal post-translational modification of extracellular proteins in eukaryotes. Here we investigate the problem of correctly annotating GPI-anchored proteins for the growing number of sequences in public databases. We developed a computational system, called FragAnchor, based on the tandem use of a neural network (NN) and a hidden Markov model (HMM). Firstly, NN selects potential GPI-anchored proteins in a dataset, then HMM parses these potential GPI signals and refines the prediction by qualitative scoring. FragAnchor correctly predicted 91% of all the GPI-anchored proteins annotated in the Swiss-Prot database. In a large-scale analysis of 29 eukaryote proteomes, FragAnchor predicted that the percentage of highly probable GPI-anchored proteins is between 0.21% and 2.01%. The distinctive feature of FragAnchor, compared with other systems, is that it targets only the C-terminus of a protein, making it less sensitive to the background noise found in databases and possible incomplete protein sequences. Moreover, FragAnchor can be used to predict GPI-anchored proteins in all eukaryotes. Finally, by using qualitative scoring, the predictions combine both sensitivity and information content. The predictor is publicly available at http://navet.ics.hawaii.edu/~fraganchor/NNHMM/NNHMM.html.
The ISME Journal | 2013
Grieg F. Steward; Alexander I. Culley; Jaclyn A. Mueller; Elisha M. Wood-Charlson; Mahdi Belcaid; Guylaine Poisson
Viruses are abundant in the ocean and a major driving force in plankton ecology and evolution. It has been assumed that most of the viruses in seawater contain DNA and infect bacteria, but RNA-containing viruses in the ocean, which almost exclusively infect eukaryotes, have never been quantified. We compared the total mass of RNA and DNA in the viral fraction harvested from seawater and using data on the mass of nucleic acid per RNA- or DNA-containing virion, estimated the abundances of each. Our data suggest that the abundance of RNA viruses rivaled or exceeded that of DNA viruses in samples of coastal seawater. The dominant RNA viruses in the samples were marine picorna-like viruses, which have small genomes and are at or below the detection limit of common fluorescence-based counting methods. If our results are typical, this means that counts of viruses and the rate measurements that depend on them, such as viral production, are significantly underestimated by current practices. As these RNA viruses infect eukaryotes, our data imply that protists contribute more to marine viral dynamics than one might expect based on their relatively low abundance. This conclusion is a departure from the prevailing view of viruses in the ocean, but is consistent with earlier theoretical predictions.
Mbio | 2014
Alexander I. Culley; Jaclyn A. Mueller; Madhi Belcaid; Elisha M. Wood-Charlson; Guylaine Poisson; Grieg F. Steward
ABSTRACT Viruses have a profound influence on the ecology and evolution of plankton, but our understanding of the composition of the aquatic viral communities is still rudimentary. This is especially true of those viruses having RNA genomes. The limited data that have been published suggest that the RNA virioplankton is dominated by viruses with positive-sense, single-stranded (+ss) genomes that have features in common with those of eukaryote-infecting viruses in the order Picornavirales (picornavirads). In this study, we investigated the diversity of the RNA virus assemblages in tropical coastal seawater samples using targeted PCR and metagenomics. Amplification of RNA-dependent RNA polymerase (RdRp) genes from fractions of a buoyant density gradient suggested that the distribution of two major subclades of the marine picornavirads was largely congruent with the distribution of total virus-like RNA, a finding consistent with their proposed dominance. Analyses of the RdRp sequences in the library revealed the presence of many diverse phylotypes, most of which were related only distantly to those of cultivated viruses. Phylogenetic analysis suggests that there were hundreds of unique picornavirad-like phylotypes in one 35-liter sample that differed from one another by at least as much as the differences among currently recognized species. Assembly of the sequences in the metagenome resulted in the reconstruction of six essentially complete viral genomes that had features similar to viruses in the families Bacillarna-, Dicistro-, and Marnaviridae. Comparison of the tropical seawater metagenomes with those from other habitats suggests that +ssRNA viruses are generally the most common types of RNA viruses in aquatic environments, but biases in library preparation remain a possible explanation for this observation. IMPORTANCE Marine plankton account for much of the photosynthesis and respiration on our planet, and they influence the cycling of carbon and the distribution of nutrients on a global scale. Despite the fundamental importance of viruses to plankton ecology and evolution, most of the viruses in the sea, and the identities of their hosts, are unknown. This report is one of very few that delves into the genetic diversity within RNA-containing viruses in the ocean. The data expand the known range of viral diversity and shed new light on the physical properties and genetic composition of RNA viruses in the ocean. Marine plankton account for much of the photosynthesis and respiration on our planet, and they influence the cycling of carbon and the distribution of nutrients on a global scale. Despite the fundamental importance of viruses to plankton ecology and evolution, most of the viruses in the sea, and the identities of their hosts, are unknown. This report is one of very few that delves into the genetic diversity within RNA-containing viruses in the ocean. The data expand the known range of viral diversity and shed new light on the physical properties and genetic composition of RNA viruses in the ocean.
BMC Genomics | 2013
Mark Menor; Kyungim Baek; Guylaine Poisson
BackgroundClassification is the problem of assigning each input object to one of a finite number of classes. This problem has been extensively studied in machine learning and statistics, and there are numerous applications to bioinformatics as well as many other fields. Building a multiclass classifier has been a challenge, where the direct approach of altering the binary classification algorithm to accommodate more than two classes can be computationally too expensive. Hence the indirect approach of using binary decomposition has been commonly used, in which retrieving the class posterior probabilities from the set of binary posterior probabilities given by the individual binary classifiers has been a major issue.MethodsIn this work, we present an extension of a recently introduced probabilistic kernel-based learning algorithm called the Classification Relevance Units Machine (CRUM) to the multiclass setting to increase its applicability. The extension is achieved under the error correcting output codes framework. The probabilistic outputs of the binary CRUMs are preserved using a proposed linear-time decoding algorithm, an alternative to the generalized Bradley-Terry (GBT) algorithm whose application to large-scale prediction settings is prohibited by its computational complexity. The resulting classifier is called the Multiclass Relevance Units Machine (McRUM).ResultsThe evaluation of McRUM on a variety of real small-scale benchmark datasets shows that our proposed Naïve decoding algorithm is computationally more efficient than the GBT algorithm while maintaining a similar level of predictive accuracy. Then a set of experiments on a larger scale dataset for small ncRNA classification have been conducted with Naïve McRUM and compared with the Gaussian and linear SVM. Although McRUMs predictive performance is slightly lower than the Gaussian SVM, the results show that the similar level of true positive rate can be achieved by sacrificing false positive rate slightly. Furthermore, McRUM is computationally more efficient than the SVM, which is an important factor for large-scale analysis.ConclusionsWe have proposed McRUM, a multiclass extension of binary CRUM. McRUM with Naïve decoding algorithm is computationally efficient in run-time and its predictive performance is comparable to the well-known SVM, showing its potential in solving large-scale multiclass problems in bioinformatics and other fields of study.
Journal of Computational Biology | 2010
Mahdi Belcaid; Anne Bergeron; Guylaine Poisson
Comparing the genomes of two closely related viruses often produces mosaics where nearly identical sequences alternate with sequences that are unique to each genome. When several closely related genomes are compared, the unique sequences are likely to be shared with third genomes, leading to virus mosaic communities. Here we present comparative analysis of sets of Staphylococcus aureus phages that share large identical sequences with up to three other genomes, and with different partners along their genomes. We introduce mosaic graphs to represent these complex recombination events, and use them to illustrate the breath and depth of sequence sharing: some genomes are almost completely made up of shared sequences, while genomes that share very large identical sequences can adopt alternate functional modules. Mosaic graphs also allow us to identify breakpoints that could eventually be used for the construction of recombination networks. These findings have several implications on phage metagenomics assembly, on the horizontal gene transfer paradigm, and more generally on the understanding of the composition and evolutionary dynamics of virus communities.
International Journal of Molecular Sciences | 2015
Mark Menor; Kyungim Baek; Guylaine Poisson
The discovery of novel microRNA (miRNA) and piwi-interacting RNA (piRNA) is an important task for the understanding of many biological processes. Most of the available miRNA and piRNA identification methods are dependent on the availability of the organism’s genome sequence and the quality of its annotation. Therefore, an efficient prediction method based solely on the short RNA reads and requiring no genomic information is highly desirable. In this study, we propose an approach that relies primarily on the nucleotide composition of the read and does not require reference genomes of related species for prediction. Using an empirical Bayesian kernel method and the error correcting output codes framework, compact models suitable for large-scale analyses are built on databases of known mature miRNAs and piRNAs. We found that the usage of an L1-based Gaussian kernel can double the true positive rate compared to the standard L2-based Gaussian kernel. Our approach can increase the true positive rate by at most 60% compared to the existing piRNA predictor based on the analysis of a hold-out test set. Using experimental data, we also show that our approach can detect about an order of magnitude or more known miRNAs than the mature miRNA predictor, miRPlex.
ACM Sigapp Applied Computing Review | 2012
Mark Menor; Kyungim Baek; Guylaine Poisson
Phosphorylation is an important post-translational modification of proteins that is essential to the regulation of many cellular processes. Although most of the phosphorylation sites discovered in protein sequences have been identified experimentally, the in vivo and in vitro discovery of the sites is an expensive, time-consuming and laborious task. Therefore, the development of computational methods for prediction of protein phosphorylation sites has drawn considerable attention. In this work, we present a kernel-based probabilistic Classification Relevance Units Machine (CRUM) for in silico phosphorylation site prediction. In comparison with the popular Support Vector Machine (SVM) CRUM shows comparable predictive performance and yet provides a more parsimonious model. This is desirable since it leads to a reduction in prediction run-time, which is important in predictions on large-scale data. Furthermore, the CRUM training algorithm has lower run-time and memory complexity and has a simpler parameter selection scheme than the Relevance Vector Machine (RVM) learning algorithm. To further investigate the viability of using CRUM in phosphorylation site prediction, we construct multiple CRUM predictors using different combinations of three phosphorylation site features -- BLOSUM encoding, disorder, and amino acid composition. The predictors are evaluated through cross-validation and the results show that CRUM with BLOSUM feature is among the best performing CRUM predictors in both cross-validation and benchmark experiments. A comparative study with existing prediction tools in an independent benchmark experiment suggests possible direction for further improving the predictive performance of CRUM predictors.
asia-pacific bioinformatics conference | 2007
Mahdi Belcaid; Anne Bergeron; Annie Chateau; Cedric Chauve; Yannick Gingras; Guylaine Poisson; M. Vendette
Genomes evolve with both mutations and large scale events, such as inversions, translocations, duplications and losses, that modify the structure of a set of chromosomes. In order to study these types of large-scale events, the first task is to select, in different genomes, sub-sequences that are considered “equivalent”. Many approaches have been used to identify equivalent sequences, either based on biological experiments, gene annotations, or sequence alignments. These techniques suffer from a variety of drawbacks that often result in the impossibility, for independent researchers, to reproduce the datasets used in the studies, or to adapt them to newly sequenced genomes. In this paper, we show that carefully selected small probes can be efficiently used to construct datasets. Once a set of probes is identified ‐ and published ‐, datasets for whole genome comparisons can be produced, and reproduced, with elementary algorithms; decisions about what is considered an occurrence of a probe in a genome can be criticized and reevaluated; and the structure of a newly sequenced genome can be obtained rapidly, without the need of gene annotations or intensive computations.
acm symposium on applied computing | 2012
Mark Menor; Guylaine Poisson; Kyungim Baek
Phosphorylation is an important post-translational modification of proteins that is essential to the regulation of many cellular process. The in vivo and in vitro discovery of phos-phorylation sites is an expensive, time-consuming and laborious task. In this preliminary study, we assess the viability of using our proposed probabilistic Classification Relevance Units Machine (CRUM) for in silico phosphorylation site prediction. We conduct a comparison with the popular Support Vector Machine (SVM) and the Relevance Vector Machine (RVM) that, unlike the SVM, has not been applied to phosphorylation site prediction. The resulting CRUM and RVM predictors offer comparable predictive performance to the SVM. The main advantages of CRUM and RVM over the SVM are: 1. An estimation of the posterior probability of the site being phosphorylatable, providing biologists an important measurement of the uncertainty of the prediction. 2. A more parsimonious model, leading to a reduction in prediction run-time that is important in predictions on large-scale data. Furthermore, the CRUM training algorithm has lower runtime and memory complexity and has a simpler parameter selection scheme than the RVM learning algorithm. Therefore we conclude that the CRUM is the most viable kernel machine for probabilistic prediction of protein phosphorylation sites.
Kew Bulletin | 2011
Guylaine Poisson; Denis Barabé
SummaryThe inflorescence of Dracontium polyphyllum consists of 150 – 300 flowers arranged in recognisable spirals. The flower has 5 – 6 (90% of observed specimens), or 7 broad tepals enclosing 9 – 12 stamens (occasionally 7) inserted in two whorls. The gynoecium is trilocular (90% of observed specimens) or tetralocular. The tetralocular gynoecia are found at random among the trilocular gynoecia. Each locule encloses an ovule inserted in an axile position, in the median portion of the ovary. Each carpel has its own stylar canal. However, in the upper portion of the style, there is only one common stylar canal. Floral organs are initiated in an acropetal direction in the following sequence: tepals, stamens, and carpels. During later stages of development, the tepals progressively cover the other floral organs. The first floral primordia are initiated on the upper portion of the inflorescence. During early stages of development, the floral primordia have a circular shape. The tepals are initiated nearly simultaneously. During later stages of development, the first whorl of stamens develops in alternation with the tepals and is followed by a second whorl of stamens. The trilocular or tetralocular nature of the ovary is clearly visible during early stages of development of the gynoecium. Recent molecular studies show that Anaphyllopsis A. Hay and Dracontium L. are closely related. However, although pentamerous flowers have been observed in Anaphyllopsis, the developmental morphology of the flower of Dracontium is different from that of Anaphyllopsis.