Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Martin G. Reese is active.

Publication


Featured researches published by Martin G. Reese.


research in computational molecular biology | 1997

Improved splice site detection in Genie

Martin G. Reese; Frank H. Eeckman; David Kulp; David Haussler

We present an improved splice site predictor for the genefinding program Genie. Genie is based on a generalized Hidden Markov Model (GHMM) that describes the grammar of a legal parse of a multi-exon gene in a DNA sequence. In Genie, probabilities are estimated for gene features by using dynamic programming to combine information from multiple content and signal sensors, including sensors that integrate matches to homologous sequences from a database. One of the hardest problems in genefinding is to determine the complete gene structure correctly. The splice site sensors are the key signal sensors that address this problem. We replaced the existing splice site sensors in Genie with two novel neural networks based on dinucleotide frequencies. Using these novel sensors, Genie shows significant improvements in the sensitivity and specificity of gene structure identification. Experimental results in tests using a standard set of annotated genes showed that Genie identified 86% of coding nucleotides correctly with a specificity of 85%, versus 80% and 84% in the older system. In further splice site experiments, we also looked at correlations between splice site scores and intron and exon lengths, as well as at the effect of distance to the nearest splice site on false positive rates.


Genome Research | 2009

Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding

Kevin McKernan; Heather E. Peckham; Gina Costa; Stephen F. McLaughlin; Yutao Fu; Eric F. Tsung; Christopher Clouser; Cisyla Duncan; Jeffrey K. Ichikawa; Clarence Lee; Zheng Zhang; Swati Ranade; Eileen T. Dimalanta; Fiona Hyland; Tanya Sokolsky; Lei Zhang; Andrew Sheridan; Haoning Fu; Cynthia L. Hendrickson; Bin Li; Lev Kotler; Jeremy Stuart; Joel A. Malek; Jonathan M. Manning; Alena A. Antipova; Damon S. Perez; Michael P. Moore; Kathleen Hayashibara; Michael R. Lyons; Robert E. Beaudoin

We describe the genome sequencing of an anonymous individual of African origin using a novel ligation-based sequencing assay that enables a unique form of error correction that improves the raw accuracy of the aligned reads to >99.9%, allowing us to accurately call SNPs with as few as two reads per allele. We collected several billion mate-paired reads yielding approximately 18x haploid coverage of aligned sequence and close to 300x clone coverage. Over 98% of the reference genome is covered with at least one uniquely placed read, and 99.65% is spanned by at least one uniquely placed mate-paired clone. We identify over 3.8 million SNPs, 19% of which are novel. Mate-paired data are used to physically resolve haplotype phases of nearly two-thirds of the genotypes obtained and produce phased segments of up to 215 kb. We detect 226,529 intra-read indels, 5590 indels between mate-paired reads, 91 inversions, and four gene fusions. We use a novel approach for detecting indels between mate-paired reads that are smaller than the standard deviation of the insert size of the library and discover deletions in common with those detected with our intra-read approach. Dozens of mutations previously described in OMIM and hundreds of nonsynonymous single-nucleotide and structural variants in genes previously implicated in disease are identified in this individual. There is more genetic variation in the human genome still to be uncovered, and we provide guidance for future surveys in populations and cancer biopsies.


Nature Biotechnology | 2012

Assuring the quality of next-generation sequencing in clinical laboratory practice

Amy S. Gargis; Lisa Kalman; Meredith W Berry; David P. Bick; David Dimmock; Tina Hambuch; Fei Lu; Elaine Lyon; Karl V. Voelkerding; Barbara A. Zehnbauer; Richa Agarwala; Sarah F. Bennett; Bin Chen; Ephrem L.H. Chin; John Compton; Soma Das; Daniel H. Farkas; Matthew J. Ferber; Birgit Funke; Manohar R. Furtado; Lilia Ganova-Raeva; Ute Geigenmüller; Sandra J Gunselman; Madhuri Hegde; Philip L. F. Johnson; Andrew Kasarskis; Shashikant Kulkarni; Thomas Lenk; Cs Jonathan Liu; Megan Manion

Amy S Gargis, Centers for Disease Control and Prevention Lisa Kalman, Centers for Disease Control and Prevention Meredith W Berry, SeqWright Inc David P Bick, Medical College of Wisconsin David P Dimmock, Medical College of Wisconsin Tina Hambuch, Illumina Clinical Services Fei Lu, SeqWright Inc Elaine Lyon, University of Utah Karl V Voelkerding, University of Utah Barbara Zehnbauer, Emory University


Genome Research | 2011

A probabilistic disease-gene finder for personal genomes

Mark Yandell; Chad D. Huff; Hao Hu; Marc Singleton; Barry Moore; Jinchuan Xing; Lynn B. Jorde; Martin G. Reese

VAAST (the Variant Annotation, Analysis & Search Tool) is a probabilistic search tool for identifying damaged genes and their disease-causing variants in personal genome sequences. VAAST builds on existing amino acid substitution (AAS) and aggregative approaches to variant prioritization, combining elements of both into a single unified likelihood framework that allows users to identify damaged genes and deleterious variants with greater accuracy, and in an easy-to-use fashion. VAAST can score both coding and noncoding variants, evaluating the cumulative impact of both types of variants simultaneously. VAAST can identify rare variants causing rare genetic diseases, and it can also use both rare and common variants to identify genes responsible for common diseases. VAAST thus has a much greater scope of use than any existing methodology. Here we demonstrate its ability to identify damaged genes using small cohorts (n = 3) of unrelated individuals, wherein no two share the same deleterious variants, and for common, multigenic diseases using as few as 150 cases.


Bioinformatics | 1999

Interpolated markov chains for eukaryotic promoter recognition.

Uwe Ohler; Stefan Harbeck; Heinrich Niemann; Elmar Nöth; Martin G. Reese

MOTIVATION We describe a new content-based approach for the detection of promoter regions of eukaryotic protein encoding genes. Our system is based on three interpolated Markov chains (IMCs) of different order which are trained on coding, non-coding and promoter sequences. It was recently shown that the interpolation of Markov chains leads to stable parameters and improves on the results in microbial gene finding (Salzberg et al., Nucleic Acids Res., 26, 544-548, 1998). Here, we present new methods for an automated estimation of optimal interpolation parameters and show how the IMCs can be applied to detect promoters in contiguous DNA sequences. Our interpolation approach can also be employed to obtain a reliable scoring function for human coding DNA regions, and the trained models can easily be incorporated in the general framework for gene recognition systems. RESULTS A 5-fold cross-validation evaluation of our IMC approach on a representative sequence set yielded a mean correlation coefficient of 0.84 (promoter versus coding sequences) and 0.53 (promoter versus non-coding sequences). Applied to the task of eukaryotic promoter region identification in genomic DNA sequences, our classifier identifies 50% of the promoter regions in the sequences used in the most recent review and comparison by Fickett and Hatzigeorgiou ( Genome Res., 7, 861-878, 1997), while having a false-positive rate of 1/849 bp.


Genome Biology | 2010

A standard variation file format for human genome sequences

Martin G. Reese; Barry Moore; Colin R. Batchelor; Fidel Salas; Fiona Cunningham; Gabor T. Marth; Lincoln Stein; Paul Flicek; Mark Yandell; Karen Eilbeck

Here we describe the Genome Variation Format (GVF) and the 10Gen dataset. GVF, an extension of Generic Feature Format version 3 (GFF3), is a simple tab-delimited format for DNA variant files, which uses Sequence Ontology to describe genome variation data. The 10Gen dataset, ten human genomes in GVF format, is freely available for community analysis from the Sequence Ontology website and from an Amazon elastic block storage (EBS) snapshot for use in Amazons EC2 cloud computing environment.


Genetic Epidemiology | 2013

VAAST 2.0: Improved Variant Classification and Disease-Gene Identification Using a Conservation-Controlled Amino Acid Substitution Matrix

Hao Hu; Chad D. Huff; Barry Moore; Steven Flygare; Martin G. Reese; Mark Yandell

The need for improved algorithmic support for variant prioritization and disease‐gene identification in personal genomes data is widely acknowledged. We previously presented the Variant Annotation, Analysis, and Search Tool (VAAST), which employs an aggregative variant association test that combines both amino acid substitution (AAS) and allele frequencies. Here we describe and benchmark VAAST 2.0, which uses a novel conservation‐controlled AAS matrix (CASM), to incorporate information about phylogenetic conservation. We show that the CASM approach improves VAASTs variant prioritization accuracy compared to its previous implementation, and compared to SIFT, PolyPhen‐2, and MutationTaster. We also show that VAAST 2.0 outperforms KBAC, WSS, SKAT, and variable threshold (VT) using published case‐control datasets for Crohn disease (NOD2), hypertriglyceridemia (LPL), and breast cancer (CHEK2). VAAST 2.0 also improves search accuracy on simulated datasets across a wide range of allele frequencies, population‐attributable disease risks, and allelic heterogeneity, factors that compromise the accuracies of other aggregative variant association tests. We also demonstrate that, although most aggregative variant association tests are designed for common genetic diseases, these tests can be easily adopted as rare Mendelian disease‐gene finders with a simple ranking‐by‐statistical‐significance protocol, and the performance compares very favorably to state‐of‐art filtering approaches. The latter, despite their popularity, have suboptimal performance especially with the increasing case sample size.


American Journal of Human Genetics | 2014

Phevor Combines Multiple Biomedical Ontologies for Accurate Identification of Disease-Causing Alleles in Single Individuals and Small Nuclear Families

Marc Singleton; Stephen L. Guthery; Karl V. Voelkerding; Karin Chen; Brett Kennedy; Rebecca L. Margraf; Jacob D. Durtschi; Karen Eilbeck; Martin G. Reese; Lynn B. Jorde; Chad D. Huff; Mark Yandell

Phevor integrates phenotype, gene function, and disease information with personal genomic data for improved power to identify disease-causing alleles. Phevor works by combining knowledge resident in multiple biomedical ontologies with the outputs of variant-prioritization tools. It does so by using an algorithm that propagates information across and between ontologies. This process enables Phevor to accurately reprioritize potentially damaging alleles identified by variant-prioritization tools in light of gene function, disease, and phenotype knowledge. Phevor is especially useful for single-exome and family-trio-based diagnostic analyses, the most commonly occurring clinical scenarios and ones for which existing personal genome diagnostic tools are most inaccurate and underpowered. Here, we present a series of benchmark analyses illustrating Phevors performance characteristics. Also presented are three recent Utah Genome Project case studies in which Phevor was used to identify disease-causing alleles. Collectively, these results show that Phevor improves diagnostic accuracy not only for individuals presenting with established disease phenotypes but also for those with previously undescribed and atypical disease presentations. Importantly, Phevor is not limited to known diseases or known disease-causing alleles. As we demonstrate, Phevor can also use latent information in ontologies to discover genes and disease-causing alleles not previously associated with disease.


American Journal of Human Genetics | 2012

Population Genetic Inference from Personal Genome Data: Impact of Ancestry and Admixture on Human Genomic Variation

Jeffrey M. Kidd; Simon Gravel; Jake K. Byrnes; Andres Moreno-Estrada; Shaila Musharoff; Katarzyna Bryc; Jeremiah D. Degenhardt; Abra Brisbin; Vrunda Sheth; Rong Chen; Stephen F. McLaughlin; Heather E. Peckham; Larsson Omberg; Christina A. Bormann Chung; Sarah Stanley; Kevin A. Pearlstein; Elizabeth Levandowsky; Suehelay Acevedo-Acevedo; Adam Auton; Alon Keinan; Victor Acuña-Alonzo; Rodrigo Barquera-Lozano; Samuel Canizales-Quinteros; Celeste Eng; Esteban G. Burchard; Archie Russell; Andrew R. Reynolds; Andrew G. Clark; Martin G. Reese; Stephen E. Lincoln

Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from 11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas-70% of the European ancestry in todays African Americans dates back to European gene flow happening only 7-8 generations ago.


Nature Biotechnology | 2014

A unified test of linkage analysis and rare-variant association for analysis of pedigree sequence data

Hao Hu; Jared C. Roach; Hilary Coon; Stephen L. Guthery; Karl V. Voelkerding; Rebecca L. Margraf; Jacob D. Durtschi; Sean V. Tavtigian; Shankaracharya; Wilfred Wu; Paul Scheet; Shuoguo Wang; Jinchuan Xing; Gustavo Glusman; Robert Hubley; Hong Li; Vidu Garg; Barry Moore; Leroy Hood; David J. Galas; Deepak Srivastava; Martin G. Reese; Lynn B. Jorde; Mark Yandell; Chad D. Huff

High-throughput sequencing of related individuals has become an important tool for studying human disease. However, owing to technical complexity and lack of available tools, most pedigree-based sequencing studies rely on an ad hoc combination of suboptimal analyses. Here we present pedigree-VAAST (pVAAST), a disease-gene identification tool designed for high-throughput sequence data in pedigrees. pVAAST uses a sequence-based model to perform variant and gene-based linkage analysis. Linkage information is then combined with functional prediction and rare variant case-control association information in a unified statistical framework. pVAAST outperformed linkage and rare-variant association tests in simulations and identified disease-causing genes from whole-genome sequence data in three human pedigrees with dominant, recessive and de novo inheritance patterns. The approach is robust to incomplete penetrance and locus heterogeneity and is applicable to a wide variety of genetic traits. pVAAST maintains high power across studies of monogenic, high-penetrance phenotypes in a single pedigree to highly polygenic, common phenotypes involving hundreds of pedigrees.

Collaboration


Dive into the Martin G. Reese's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gholson J. Lyon

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar

Han Fang

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar

Jason O'Rawe

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chad D. Huff

University of Texas MD Anderson Cancer Center

View shared research outputs
Top Co-Authors

Avatar

Hao Hu

University of Texas MD Anderson Cancer Center

View shared research outputs
Researchain Logo
Decentralizing Knowledge