Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David Haussler is active.

Publication


Featured researches published by David Haussler.


Nucleic Acids Research | 2006

The UCSC genome browser database: update 2007

Robert M. Kuhn; Donna Karolchik; Ann S. Zweig; Heather Trumbower; Daryl J. Thomas; Archana Thakkapallayil; Charles W. Sugnet; Mario Stanke; Kayla E. Smith; Adam Siepel; Kate R. Rosenbloom; Brooke Rhead; Brian J. Raney; Andrew A. Pohl; Jakob Skou Pedersen; Fan Hsu; Angie S. Hinrichs; Rachel A. Harte; Mark Diekhans; Hiram Clawson; Gill Bejerano; Galt P. Barber; Robert Baertsch; David Haussler; William Kent

The UCSC Genome Browser Database (GBD, http://genome.ucsc.edu) is a publicly available collection of genome assembly sequence data and integrated annotations for a large number of organisms, including extensive comparative-genomic resources. In the past year, 13 new genome assemblies have been added, including two important primate species, orangutan and marmoset, bringing the total to 46 assemblies for 24 different vertebrates and 39 assemblies for 22 different invertebrate animals. The GBD datasets may be viewed graphically with the UCSC Genome Browser, which uses a coordinate-based display system allowing users to juxtapose a wide variety of data. These data include all mRNAs from GenBank mapped to all organisms, RefSeq alignments, gene predictions, regulatory elements, gene expression data, repeats, SNPs and other variation data, as well as pairwise and multiple-genome alignments. A variety of other bioinformatics tools are also provided, including BLAT, the Table Browser, the Gene Sorter, the Proteome Browser, VisiGene and Genome Graphs.


Bioinformatics | 2000

Support vector machine classification and validation of cancer tissue samples using microarray expression data

Terrence S. Furey; Nello Cristianini; Nigel Duffy; David W. Bednarski; Michèl Schummer; David Haussler

MOTIVATION DNA microarray experiments generating thousands of gene expression measurements, are being used to gather information from tissue and cell samples regarding gene expression differences that will be useful in diagnosing disease. We have developed a new method to analyse this kind of data using support vector machines (SVMs). This analysis consists of both classification of the tissue samples, and an exploration of the data for mis-labeled or questionable tissue results. RESULTS We demonstrate the method in detail on samples consisting of ovarian cancer tissues, normal ovarian tissues, and other normal tissues. The dataset consists of expression experiment results for 97,802 cDNAs for each tissue. As a result of computational analysis, a tissue sample is discovered and confirmed to be wrongly labeled. Upon correction of this mistake and the removal of an outlier, perfect classification of tissues is achieved, but not with high confidence. We identify and analyse a subset of genes from the ovarian dataset whose expression is highly differentiated between the types of tissues. To show robustness of the SVM method, two previously published datasets from other types of tissues or cells are analysed. The results are comparable to those previously obtained. We show that other machine learning methods also perform comparably to the SVM on many of those datasets. AVAILABILITY The SVM software is available at http://www.cs. columbia.edu/ approximately bgrundy/svm.


Genome Research | 2012

GENCODE: The reference human genome annotation for The ENCODE Project

Jennifer Harrow; Adam Frankish; José Manuel Rodríguez González; Electra Tapanari; Mark Diekhans; Felix Kokocinski; Bronwen Aken; Daniel Barrell; Amonida Zadissa; Stephen M. J. Searle; I. Barnes; Alexandra Bignell; Veronika Boychenko; Toby Hunt; Mike Kay; Gaurab Mukherjee; Jeena Rajan; Gloria Despacio-Reyes; Gary Saunders; Charles A. Steward; Rachel A. Harte; Mike Lin; Cédric Howald; Andrea Tanzer; Thomas Derrien; Jacqueline Chrast; Nathalie Walters; Suganthi Balasubramanian; Baikang Pei; Michael L. Tress

The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.


Cell | 2013

The somatic genomic landscape of glioblastoma.

Cameron Brennan; Roel G.W. Verhaak; Aaron McKenna; Benito Campos; Houtan Noushmehr; Sofie R. Salama; Siyuan Zheng; Debyani Chakravarty; J. Zachary Sanborn; Samuel H. Berman; Rameen Beroukhim; Brady Bernard; Chang-Jiun Wu; Giannicola Genovese; Ilya Shmulevich; Jill S. Barnholtz-Sloan; Lihua Zou; Rahulsimham Vegesna; Sachet A. Shukla; Giovanni Ciriello; W.K. Yung; Wei Zhang; Carrie Sougnez; Tom Mikkelsen; Kenneth D. Aldape; Darell D. Bigner; Erwin G. Van Meir; Michael D. Prados; Andrew E. Sloan; Keith L. Black

We describe the landscape of somatic genomic alterations based on multidimensional and comprehensive characterization of more than 500 glioblastoma tumors (GBMs). We identify several novel mutated genes as well as complex rearrangements of signature receptors, including EGFR and PDGFRA. TERT promoter mutations are shown to correlate with elevated mRNA expression, supporting a role in telomerase reactivation. Correlative analyses confirm that the survival advantage of the proneural subtype is conferred by the G-CIMP phenotype, and MGMT DNA methylation may be a predictive biomarker for treatment response only in classical subtype GBM. Integrative analysis of genomic and proteomic profiles challenges the notion of therapeutic inhibition of a pathway as an alternative to inhibition of the target itself. These data will facilitate the discovery of therapeutic and diagnostic target candidates, the validation of research and clinical observations and the generation of unanticipated hypotheses that can advance our molecular understanding of this lethal cancer.


Journal of the ACM | 1989

Learnability and the Vapnik-Chervonenkis dimension

Anselm Blumer; Andrzej Ehrenfeucht; David Haussler; Manfred K. Warmuth

Valiants learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiants results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.


neural information processing systems | 1988

What Size Net Gives Valid Generalization

Eric B. Baum; David Haussler

We address the question of when a network can be expected to generalize from m random training examples chosen from some arbitrary probability distribution, assuming that future test examples are drawn from the same distribution. Among our results are the following bounds on appropriate sample vs. network size. Assume 0 < ∊ 1/8. We show that if m O(W/∊ log N/∊) random examples can be loaded on a feedforward network of linear threshold functions with N nodes and W weights, so that at least a fraction 1 ∊/2 of the examples are correctly classified, then one has confidence approaching certainty that the network will correctly classify a fraction 1 ∊ of future test examples drawn from the same distribution. Conversely, for fully-connected feedforward nets with one hidden layer, any learning algorithm using fewer than (W/∊) random training examples will, for some distributions of examples consistent with an appropriate weight choice, fail at least some fixed fraction of the time to find a weight choice that will correctly classify more than a 1 ∊ fraction of the future test examples.


Nucleic Acids Research | 2004

The UCSC Table Browser data retrieval tool

Donna Karolchik; Angela S. Hinrichs; Terrence S. Furey; Krishna M. Roskin; Charles W. Sugnet; David Haussler; W. James Kent

The University of California Santa Cruz (UCSC) Table Browser (http://genome.ucsc.edu/cgi-bin/hgText) provides text-based access to a large collection of genome assemblies and annotation data stored in the Genome Browser Database. A flexible alternative to the graphical-based Genome Browser, this tool offers an enhanced level of query support that includes restrictions based on field values, free-form SQL queries and combined queries on multiple tables. Output can be filtered to restrict the fields and lines returned, and may be organized into one of several formats, including a simple tab- delimited file that can be loaded into a spreadsheet or database as well as advanced formats that may be uploaded into the Genome Browser as custom annotation tracks. The Table Browser Users Guide located on the UCSC website provides instructions and detailed examples for constructing queries and configuring output.


research in computational molecular biology | 1997

Improved splice site detection in Genie

Martin G. Reese; Frank H. Eeckman; David Kulp; David Haussler

We present an improved splice site predictor for the genefinding program Genie. Genie is based on a generalized Hidden Markov Model (GHMM) that describes the grammar of a legal parse of a multi-exon gene in a DNA sequence. In Genie, probabilities are estimated for gene features by using dynamic programming to combine information from multiple content and signal sensors, including sensors that integrate matches to homologous sequences from a database. One of the hardest problems in genefinding is to determine the complete gene structure correctly. The splice site sensors are the key signal sensors that address this problem. We replaced the existing splice site sensors in Genie with two novel neural networks based on dinucleotide frequencies. Using these novel sensors, Genie shows significant improvements in the sensitivity and specificity of gene structure identification. Experimental results in tests using a standard set of annotated genes showed that Genie identified 86% of coding nucleotides correctly with a specificity of 85%, versus 80% and 84% in the older system. In further splice site experiments, we also looked at correlations between splice site scores and intron and exon lengths, as well as at the effect of distance to the nearest splice site on false positive rates.


Nature | 2010

Conserved role of intragenic DNA methylation in regulating alternative promoters.

Alika K. Maunakea; Raman P. Nagarajan; Mikhail Bilenky; Tracy Ballinger; Cletus D'souza; Shaun D. Fouse; Brett E. Johnson; Chibo Hong; Cydney Nielsen; Yongjun Zhao; Gustavo Turecki; Allen Delaney; Richard Varhol; Nina Thiessen; Ksenya Shchors; Vivi M. Heine; David H. Rowitch; Xiaoyun Xing; Chris Fiore; Maximiliaan Schillebeeckx; Steven J.M. Jones; David Haussler; Marco A. Marra; Martin Hirst; Ting Wang; Joseph F. Costello

Although it is known that the methylation of DNA in 5′ promoters suppresses gene expression, the role of DNA methylation in gene bodies is unclear. In mammals, tissue- and cell type-specific methylation is present in a small percentage of 5′ CpG island (CGI) promoters, whereas a far greater proportion occurs across gene bodies, coinciding with highly conserved sequences. Tissue-specific intragenic methylation might reduce, or, paradoxically, enhance transcription elongation efficiency. Capped analysis of gene expression (CAGE) experiments also indicate that transcription commonly initiates within and between genes. To investigate the role of intragenic methylation, we generated a map of DNA methylation from the human brain encompassing 24.7 million of the 28 million CpG sites. From the dense, high-resolution coverage of CpG islands, the majority of methylated CpG islands were shown to be in intragenic and intergenic regions, whereas less than 3% of CpG islands in 5′ promoters were methylated. The CpG islands in all three locations overlapped with RNA markers of transcription initiation, and unmethylated CpG islands also overlapped significantly with trimethylation of H3K4, a histone modification enriched at promoters. The general and CpG-island-specific patterns of methylation are conserved in mouse tissues. An in-depth investigation of the human SHANK3 locus and its mouse homologue demonstrated that this tissue-specific DNA methylation regulates intragenic promoter activity in vitro and in vivo. These methylation-regulated, alternative transcripts are expressed in a tissue- and cell type-specific manner, and are expressed differentially within a single cell type from distinct brain regions. These results support a major role for intragenic methylation in regulating cell context-specific alternative promoters in gene bodies.


Nucleic Acids Research | 2012

The UCSC Genome Browser database: extensions and updates 2011

Timothy R. Dreszer; Donna Karolchik; Ann S. Zweig; Angie S. Hinrichs; Brian J. Raney; Robert M. Kuhn; Laurence R. Meyer; Matthew C. Wong; Cricket A. Sloan; Kate R. Rosenbloom; Greg Roe; Brooke Rhead; Andy Pohl; Venkat S. Malladi; Chin H. Li; Katrina Learned; Vanessa M. Kirkup; Fan Hsu; Rachel A. Harte; Luvina Guruvadoo; Mary Goldman; Belinda Giardine; Pauline A. Fujita; Mark Diekhans; Melissa S. Cline; Hiram Clawson; Galt P. Barber; David Haussler; W. James Kent

The University of California Santa Cruz Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a wide variety of organisms. The Browser is an integrated tool set for visualizing, comparing, analyzing and sharing both publicly available and user-generated genomic data sets. In the past year, the local database has been updated with four new species assemblies, and we anticipate another four will be released by the end of 2011. Further, a large number of annotation tracks have been either added, updated by contributors, or remapped to the latest human reference genome. Among these are new phenotype and disease annotations, UCSC genes, and a major dbSNP update, which required new visualization methods. Growing beyond the local database, this year we have introduced ‘track data hubs’, which allow the Genome Browser to provide access to remotely located sets of annotations. This feature is designed to significantly extend the number and variety of annotation tracks that are publicly available for visualization and analysis from within our site. We have also introduced several usability features including track search and a context-sensitive menu of options available with a right-click anywhere on the Browsers image.

Collaboration


Dive into the David Haussler's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

W. James Kent

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Benedict Paten

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Brian J. Raney

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jingchun Zhu

University of California

View shared research outputs
Top Co-Authors

Avatar

Robert M. Kuhn

University of California

View shared research outputs
Researchain Logo
Decentralizing Knowledge