Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where William Noble Grundy is active.

Publication


Featured researches published by William Noble Grundy.


research in computational molecular biology | 2001

Gene functional classification from heterogeneous data

Paul Pavlidis; Jason Weston; Jinsong Cai; William Noble Grundy

In our attempts to understand cellular function at the molecular level, we must be able to synthesize information from disparate types of genomic data. We consider the problem of inferring gene functional classifications from a heterogeneous data set consisting of DNA microarray expression measurements and phylogenetic profiles from whole-genome sequence comparisons. We demonstrate the application of the support vector machine (SVM) learning algorithm to this functional inference task. Our results suggest the importance of exploiting prior information about the heterogeneity of the data. In particular, we propose an SVM kernel function that is explicitly heterogeneous. We also show how to use knowledge about heterogeneity to aid in feature selection.


intelligent systems in molecular biology | 2000

Protein Family Classification Using Sparse Markov Transducers

Eleazar Eskin; William Noble Grundy; Yoram Singer

We present a method for classifying proteins into families based on short subsequences of amino acids using a new probabilistic model called sparse Markov transducers (SMT). We classify a protein by estimating probability distributions over subsequences of amino acids from the protein. Sparse Markov transducers, similar to probabilistic suffix trees, estimate a probability distribution conditioned on an input sequence. SMTs generalize probabilistic suffix trees by allowing for wild-cards in the conditioning sequences. Since substitutions of amino acids are common in protein families, incorporating wild-cards into the model significantly improves classification performance. We present two models for building protein family classifiers using SMTs. As protein databases become larger, data driven learning algorithms for probabilistic models such as SMTs will require vast amounts of memory. We therefore describe and use efficient data structures to improve the memory usage of SMTs. We evaluate SMTs by building protein family classifiers using the Pfam and SCOP databases and compare our results to previously published results and state-of-the-art protein homology detection methods. SMTs outperform previous probabilistic suffix tree methods and under certain conditions perform comparably to state-of-the-art protein homology methods.


pacific symposium on biocomputing | 2000

Promoter region-based classification of genes.

Paul Pavlidis; Terrence S. Furey; M. Liberto; David Haussler; William Noble Grundy

In this paper we consider the problem of extracting information from the upstream untranslated regions of genes to make predictions about their transcriptional regulation. We present a method for classifying genes based on motif-based hidden Markov models (HMMs) of their promoter regions. Sequence motifs discovered in yeast promoters are used to construct HMMs that include parameters describing the number and relative locations of motifs within each sequence. Each model provides a Fisher kernel for a support vector machine, which can be used to predict the classifications of unannotated promoters. We demonstrate this method on two classes of genes from the budding yeast, S. cerevisiae. Our results suggest that the additional sequence features captured by the HMM assist in correctly classifying promoters.


Journal of Computational Biology | 1998

Homology detection via family pairwise search.

William Noble Grundy

The function of an unknown biological sequence can often be accurately inferred by identifying sequences homologous to the original sequence. Given a query set of known homologs, there exist at least three general classes of techniques for finding additional homologs: pairwise sequence comparisons, motif analysis, and hidden Markov modeling. Pairwise sequence comparisons are typically employed when only a single query sequence is known. Hidden Markov models (HMMs), on the other hand, are usually trained with sets of more than 100 sequences. Motif-based methods fall in between these two extremes. The current work introduces a straightforward generalization of pairwise sequence comparison algorithms to the case when multiple query sequences are available. This algorithm, called Family Pairwise Search (FPS), combines pairwise sequence comparison scores from each query sequence. A BLAST implementation of FPS is compared to representative examples of hidden Markov modeling (HMMER) and motif modeling (MEME). The three techniques are compared across a wide range of protein families, using query sets of varying sizes. BLAST FPS significantly outperforms motif-based and HMM methods. Furthermore, FPS is much more efficient than the training algorithms for statistical models.


Archive | 2000

Combining Microarray Expression Data and Phylogenetic Profiles to Learn Gene Functional Categories Using Support Vector Machines

Paul Pavlidis; William Noble Grundy

A primary goal in biology is to understand the molecular machinery of the cell. The sequencing projects currently underway provide one view of this machinery. A complementary view is provided by data from DNA microarray hybridization experiments. Synthesizing the information from these disparate types of data requires the development of improved computational techniques. We demonstrate how to apply a machine learning algorithm called support vector machines to a heterogeneous data set consisting of expression data as well as phylogenetic pro les derived from sequence similarity searches against a collection of complete genomes. The two types of data provide accurate pictures of overlapping subsets of the gene functional categories present in the cell. Combining the expression data and phylogenetic pro les within a single learning algorithm frequently yields superior classication performance compared to using either data set alone. However, the improvement is not uniform across functional classes. For the data sets investigated here, 23-element phylogenetic pro les typically provide more information than 79-element expression vectors. Often, adding expression data to the phylogenetic pro les introduces more noise than information. Thus, these two types of data should only be combined when there is evidence that the functional classi cation of interest is clearly re ected in both data sets.


Proceedings of the National Academy of Sciences of the United States of America | 2000

Knowledge-based analysis of microarray gene expression data by using support vector machines

Michael P. S. Brown; William Noble Grundy; David Yin-wei Lin; Nello Cristianini; Charles W. Sugnet; Terrence S. Furey; Manuel Ares; David Haussler


Archive | 1999

Support Vector Machine Classification of Microarray from Gene Expression Data

M Brown; William Noble Grundy; David Yin-wei Lin; Nello Cristianini; Charles W. Sugnet; Manuel Ares; David Haussler


American Journal of Obstetrics and Gynecology | 2001

A high-throughput study of gene expression in preterm labor with a subtractive microarray approach.

Rebecca A. Muhle; Paul Pavlidis; William Noble Grundy; Emmet Hirsch


Archive | 2000

Knowledge-based analysis of microarray gene expression

M.P.S. Brwon; William Noble Grundy; Dawei Lin; Nello Cristianini; Charles W. Sugnet; Terrence S. Furey; Ares; David Haussler


intelligent systems in molecular biology | 2001

Using mixtures of common ancestors for estimating the probabilities of discrete events in biological sequences

Eleazar Eskin; William Noble Grundy; Yoram Singer

Collaboration


Dive into the William Noble Grundy's collaboration.

Top Co-Authors

Avatar

Paul Pavlidis

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar

David Haussler

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Terrence S. Furey

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eleazar Eskin

University of California

View shared research outputs
Top Co-Authors

Avatar

Emmet Hirsch

Northwestern University

View shared research outputs
Top Co-Authors

Avatar

Manuel Ares

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge