Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David Kulp is active.

Publication


Featured researches published by David Kulp.


research in computational molecular biology | 1997

Improved splice site detection in Genie

Martin G. Reese; Frank H. Eeckman; David Kulp; David Haussler

We present an improved splice site predictor for the genefinding program Genie. Genie is based on a generalized Hidden Markov Model (GHMM) that describes the grammar of a legal parse of a multi-exon gene in a DNA sequence. In Genie, probabilities are estimated for gene features by using dynamic programming to combine information from multiple content and signal sensors, including sensors that integrate matches to homologous sequences from a database. One of the hardest problems in genefinding is to determine the complete gene structure correctly. The splice site sensors are the key signal sensors that address this problem. We replaced the existing splice site sensors in Genie with two novel neural networks based on dinucleotide frequencies. Using these novel sensors, Genie shows significant improvements in the sensitivity and specificity of gene structure identification. Experimental results in tests using a standard set of annotated genes showed that Genie identified 86% of coding nucleotides correctly with a specificity of 85%, versus 80% and 84% in the older system. In further splice site experiments, we also looked at correlations between splice site scores and intron and exon lengths, as well as at the effect of distance to the nearest splice site on false positive rates.


Science | 2007

Genome sequence of Aedes aegypti, a major arbovirus vector

Vishvanath Nene; Jennifer R. Wortman; Daniel John Lawson; Brian J. Haas; Chinnappa D. Kodira; Zhijian Jake Tu; Brendan J. Loftus; Zhiyong Xi; Karyn Megy; Manfred Grabherr; Quinghu Ren; Evgeny M. Zdobnov; Neil F. Lobo; Kathryn S. Campbell; Susan E. Brown; Maria F. Bonaldo; Jingsong Zhu; Steven P. Sinkins; David G. Hogenkamp; Paolo Amedeo; Peter Arensburger; Peter W. Atkinson; Shelby Bidwell; Jim Biedler; Ewan Birney; Robert V. Bruggner; Javier Costas; Monique R. Coy; Jonathan Crabtree; Matt Crawford

We present a draft sequence of the genome of Aedes aegypti, the primary vector for yellow fever and dengue fever, which at ∼1376 million base pairs is about 5 times the size of the genome of the malaria vector Anopheles gambiae. Nearly 50% of the Ae. aegypti genome consists of transposable elements. These contribute to a factor of ∼4 to 6 increase in average gene length and in sizes of intergenic regions relative to An. gambiae and Drosophila melanogaster. Nonetheless, chromosomal synteny is generally maintained among all three insects, although conservation of orthologous gene order is higher (by a factor of ∼2) between the mosquito species than between either of them and the fruit fly. An increase in genes encoding odorant binding, cytochrome P450, and cuticle domains relative to An. gambiae suggests that members of these protein families underpin some of the biological differences between the two mosquito species.


Nucleic Acids Research | 2003

NetAffx: Affymetrix probesets and annotations

Guoying Liu; Ann E. Loraine; Ron Shigeta; Melissa S. Cline; Jill Cheng; Venu Valmeekam; Shaw Sun; David Kulp; Michael A. Siani-Rose

NetAffx (http://www.affymetrix.com) details and annotates probesets on Affymetrix GeneChip microarrays. These annotations include (i) static information specific to the probeset composition; (ii) sequence annotations extracted from public databases; and (iii) protein sequence-level annotations derived from public domain programs, as well as libraries of hidden Markov models (HMMs) developed at Affymetrix. For each probeset, NetAffx lists the probe sequences, and the consensus sequence interrogated by the probes; for the larger chip sets, interactive maps display this sequence data in genomic context. Sequence annotations include Gene Ontology (GO) terms and depiction of GO graph relationships; predicted protein domains and motifs; orthologous sequences; links to relevant pathways; and links to public databases including UniGene, LocusLink, SWISS-PROT and OMIM.


Proceedings of the National Academy of Sciences of the United States of America | 2003

Probe selection for high-density oligonucleotide arrays

Rui Mei; Earl Hubbell; Stefan Bekiranov; Mike Mittmann; Fred C. Christians; Mei-Mei Shen; Gang Lu; Joy Fang; Wei-Min Liu; Tom Ryder; Paul Kaplan; David Kulp; Teresa Webster

High-density oligonucleotide microarrays enable simultaneous monitoring of expression levels of tens of thousands of transcripts. For accurate detection and quantitation of transcripts in the presence of cellular mRNA, it is essential to design microarrays whose oligonucleotide probes produce hybridization intensities that accurately reflect the concentration of original mRNA. We present a model-based approach that predicts optimal probes by using sequence and empirical information. We constructed a thermodynamic model for hybridization behavior and determined the influence of empirical factors on the effective fitting parameters. We designed Affymetrix GeneChip probe arrays that contained all 25-mer probes for hundreds of human and yeast transcripts and collected data over a 4,000-fold concentration range. Multiple linear regression models were built to predict hybridization intensities of each probe at given target concentrations, and each intensity profile is summarized by a probe response metric. We selected probe sets to represent each transcript that were optimized with respect to responsiveness, independence (degree to which probe sequences are nonoverlapping), and uniqueness (lack of similarity to sequences in the expressed genomic background). We show that this approach is capable of selecting probes with high sensitivity and specificity for high-density oligonucleotide arrays.


Bioinformatics | 2003

Algorithms for large-scale genotyping microarrays.

Wei-min Liu; Xiaojun Di; Geoffrey Yang; Hajime Matsuzaki; Jing Huang; Rui Mei; Thomas B. Ryder; Teresa A. Webster; Shoulian Dong; Guoying Liu; Keith W. Jones; Giulia C. Kennedy; David Kulp

MOTIVATION Analysis of many thousands of single nucleotide polymorphisms (SNPs) across whole genome is crucial to efficiently map disease genes and understanding susceptibility to diseases, drug efficacy and side effects for different populations and individuals. High density oligonucleotide microarrays provide the possibility for such analysis with reasonable cost. Such analysis requires accurate, reliable methods for feature extraction, classification, statistical modeling and filtering. RESULTS We propose the modified partitioning around medoids as a classification method for relative allele signals. We use the average silhouette width, separation and other quantities as quality measures for genotyping classification. We form robust statistical models based on the classification results and use these models to make genotype calls and calculate quality measures of calls. We apply our algorithms to several different genotyping microarrays. We use reference types, informative Mendelian relationship in families, and leave-one-out cross validation to verify our results. The concordance rates with the single base extension reference types are 99.36% for the SNPs on autosomes and 99.64% for the SNPs on sex chromosomes. The concordance of the leave-one-out test is over 99.5% and is 99.9% higher for AA, AB and BB cells. We also provide a method to determine the gender of a sample based on the heterozygous call rate of SNPs on the X chromosome. See http://www.affymetrix.com for further information. The microarray data will also be available from the Affymetrix web site. AVAILABILITY The algorithms will be available commercially in the Affymetrix software package.


Journal of Biopharmaceutical Statistics | 2004

A Knowledge-Based Clustering Algorithm Driven by Gene Ontology

Jill Cheng; Melissa S. Cline; John Martin; David Finkelstein; Tarif Awad; David Kulp; Michael A. Siani-Rose

Abstract We have developed an algorithm for inferring the degree of similarity between genes by using the graph-based structure of Gene Ontology (GO). We applied this knowledge-based similarity metric to a clique-finding algorithm for detecting sets of related genes with biological classifications. We also combined it with an expression-based distance metric to produce a co-cluster analysis, which accentuates genes with both similar expression profiles and similar biological characteristics and identifies gene clusters that are more stable and biologically meaningful. These algorithms are demonstrated in the analysis of MPRO cell differentiation time series experiments.


BMC Genomics | 2006

Causal inference of regulator-target pairs by gene mapping of expression phenotypes.

David Kulp; Manjunatha Jagalur

BackgroundCorrelations between polymorphic markers and observed phenotypes provide the basis for mapping traits in quantitative genetics. When the phenotype is gene expression, then loci involved in regulatory control can theoretically be implicated. Recent efforts to construct gene regulatory networks from genotype and gene expression data have shown that biologically relevant networks can be achieved from an integrative approach. In this paper, we consider the problem of identifying individual pairs of genes in a direct or indirect, causal, trans-acting relationship.ResultsInspired by epistatic models of multi-locus quantitative trait (QTL) mapping, we propose a unified model of expression and genotype to identify quantitative trait genes (QTG) by extending the conventional linear model to include both genotype and expression of regulator genes and their interactions. The model provides mapping of specific genes in contrast to standard linkage approaches that implicate large QTL intervals typically containing tens of genes. In simulations, we found that the method can often detect weak trans-acting regulators amid the background noise of thousands of traits and is robust to transcription models containing multiple regulator genes. We reanalyze several pleiotropic loci derived from a large set of yeast matings and identify a likely alternative regulator not previously published. However, we also found that many regulators can not be so easily mapped due to the presence of cis-acting QTLs on the regulators, which induce close linkage among small neighborhoods of genes. QTG mapped regulator-target pairs linked to ARN1 were combined to form a regulatory module, which we observed to be highly enriched in iron homeostasis related genes and contained several causally directed links that had not been identified in other automatic reconstructions of that regulatory module. Finally, we also confirm the surprising, previously published results that regulators controlling gene expression are not enriched for transcription factors, but we do show that our more precise mapping model reveals functional enrichment for several other biological processes related to the regulation of the cell.ConclusionBy incorporating interacting expression and genotype, our QTG mapping method can identify specific regulator genes in contrast to standard QTL interval mapping. We have shown that the method can recover biologically significant regulator-target pairs and the approach leads to a general framework for inducing a regulatory module network topology of directed and undirected edges that can be used to identify leads in pathway analysis.


BMC Genomics | 2010

A novel multifunctional oligonucleotide microarray for Toxoplasma gondii

Amit Bahl; Paul H. Davis; Michael S. Behnke; Florence Dzierszinski; Manjunatha Jagalur; Feng Chen; Dhanasekaran Shanmugam; Michael W. White; David Kulp; David S. Roos

BackgroundMicroarrays are invaluable tools for genome interrogation, SNP detection, and expression analysis, among other applications. Such broad capabilities would be of value to many pathogen research communities, although the development and use of genome-scale microarrays is often a costly undertaking. Therefore, effective methods for reducing unnecessary probes while maintaining or expanding functionality would be relevant to many investigators.ResultsTaking advantage of available genome sequences and annotation for Toxoplasma gondii (a pathogenic parasite responsible for illness in immunocompromised individuals) and Plasmodium falciparum (a related parasite responsible for severe human malaria), we designed a single oligonucleotide microarray capable of supporting a wide range of applications at relatively low cost, including genome-wide expression profiling for Toxoplasma, and single-nucleotide polymorphism (SNP)-based genotyping of both T. gondii and P. falciparum. Expression profiling of the three clonotypic lineages dominating T. gondii populations in North America and Europe provides a first comprehensive view of the parasite transcriptome, revealing that ~49% of all annotated genes are expressed in parasite tachyzoites (the acutely lytic stage responsible for pathogenesis) and 26% of genes are differentially expressed among strains. A novel design utilizing few probes provided high confidence genotyping, used here to resolve recombination points in the clonal progeny of sexual crosses. Recent sequencing of additional T. gondii isolates identifies >620 K new SNPs, including ~11 K that intersect with expression profiling probes, yielding additional markers for genotyping studies, and further validating the utility of a combined expression profiling/genotyping array design. Additional applications facilitating SNP and transcript discovery, alternative statistical methods for quantifying gene expression, etc. are also pursued at pilot scale to inform future array designs.ConclusionsIn addition to providing an initial global view of the T. gondii transcriptome across major lineages and permitting detailed resolution of recombination points in a historical sexual cross, the multifunctional nature of this array also allowed opportunities to exploit probes for purposes beyond their intended use, enhancing analyses. This array is in widespread use by the T. gondii research community, and several aspects of the design strategy are likely to be useful for other pathogens.


Journal of Eukaryotic Microbiology | 2003

Analysis of Chlamydomonas reinhardtii Genome Structure Using Large-Scale Sequencing of Regions on Linkage Groups I and III

Jin Billy Li; Shaoping Lin; Honggui Jia; Hongmin Wu; Bruce A. Roe; David Kulp; Gary D. Stormo; Susan K. Dutcher

Abstract Chlamydomonas reinhardtii is a unicellular green alga that has been used as a model organism for the study of flagella and basal bodies as well as photosynthesis. This report analyzes finished genomic DNA sequence for 0.5% of the nuclear genome. We have used three gene prediction programs as well as EST and protein homology data to estimate the total number of genes in Chlamydomonas to be between 12,000 and 16,400. Chlamydomonas appears to have many more genes than any other unicellular organism sequenced to date. Twenty-seven percent of the predicted genes have significant identity to both ESTs and to known proteins in other organisms, 32% of the predicted genes have significant identity to ESTs alone, and 14% have significant similarity to known proteins in other organisms. For gene prediction in Chlamydomonas, GreenGenie appeared to have the highest sensitivity and specificity at the exon level, scoring 71% and 82%, respectively. Two new alternative splicing events were predicted by aligning Chlamydomonas ESTs to the genomic sequence. Finally recombination differs between the two sequenced contigs. The 350-Kb of the Linkage group III contig is devoid of recombination, while the Linkage group I contig is 30 map units long over 33-kb.


pacific symposium on biocomputing | 2003

The effects of alternative splicing on transmembrane proteins in the mouse genome.

Melissa S. Cline; Ron Shigeta; Raymond Wheeler; Michael A. Siani-Rose; David Kulp; Ann E. Loraine

Alternative splicing is a major source of variety in mammalian mRNAs, yet many questions remain on its downstream effects on protein function. To this end, we assessed the impact of gene structure and splice variation on signal peptide and transmembrane regions in proteins. Transmembrane proteins perform several key functions in cell signaling and transport, with their function tied closely to their transmembrane architecture. Signal peptides and transmembrane regions both provide key information on protein localization. Thus, any modification to such regions will likely alter protein destination and function. We applied TMHMM and SignalP to a nonredundant set of proteins, and assessed the effects of gene structure and alternative splicing on predicted transmembrane and signal peptide regions. These regions were altered by alternative splicing in roughly half of the cases studied. Transmembrane regions are divided by introns slightly less often than expected given gene structure and transmembrane region size. However, the transmembrane regions in single-pass transmembranes are divided substantially less often than expected. This suggests that intron placement might be subject to some evolutionary pressure to preserve function in these signaling proteins. The data described in this paper is available online at http://www.affymetrix.com/community/publications/affymetrix/tmsplice/.

Collaboration


Dive into the David Kulp's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

David Haussler

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ann E. Loraine

University of North Carolina at Charlotte

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Manjunatha Jagalur

University of Massachusetts Amherst

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Martin G. Reese

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge