Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ross A. Lippert is active.

Publication


Featured researches published by Ross A. Lippert.


Proceedings of the National Academy of Sciences of the United States of America | 2004

Whole-genome shotgun assembly and comparison of human genome assemblies

Sorin Istrail; Granger Sutton; Liliana Florea; Aaron L. Halpern; Clark M. Mobarry; Ross A. Lippert; Brian Walenz; Hagit Shatkay; Ian M. Dew; Jason R. Miller; Michael Flanigan; Nathan Edwards; Randall Bolanos; Daniel Fasulo; Bjarni V. Halldórsson; Sridhar Hannenhalli; Russell Turner; Shibu Yooseph; Fu Lu; Deborah Nusskern; Bixiong Shue; Xiangqun Holly Zheng; Fei Zhong; Arthur L. Delcher; Daniel H. Huson; Saul Kravitz; Laurent Mouchard; Knut Reinert; Karin A. Remington; Andrew G. Clark

We report a whole-genome shotgun assembly (called WGSA) of the human genome generated at Celera in 2001. The Celera-generated shotgun data set consisted of 27 million sequencing reads organized in pairs by virtue of end-sequencing 2-kbp, 10-kbp, and 50-kbp inserts from shotgun clone libraries. The quality-trimmed reads covered the genome 5.3 times, and the inserts from which pairs of reads were obtained covered the genome 39 times. With the nearly complete human DNA sequence [National Center for Biotechnology Information (NCBI) Build 34] now available, it is possible to directly assess the quality, accuracy, and completeness of WGSA and of the first reconstructions of the human genome reported in two landmark papers in February 2001 [Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., et al. (2001) Science 291, 1304–1351; International Human Genome Sequencing Consortium (2001) Nature 409, 860–921]. The analysis of WGSA shows 97% order and orientation agreement with NCBI Build 34, where most of the 3% of sequence out of order is due to scaffold placement problems as opposed to assembly errors within the scaffolds themselves. In addition, WGSA fills some of the remaining gaps in NCBI Build 34. The early genome sequences all covered about the same amount of the genome, but they did so in different ways. The Celera results provide more order and orientation, and the consortium sequence provides better coverage of exact and nearly exact repeats.


Cancer Research | 2004

Estrogen Receptor Genotypes and Haplotypes Associated with Breast Cancer Risk

Bert Gold; Francis Kalush; Julie Bergeron; Kevin Scott; Nandita Mitra; Kelly Wilson; Nathan A. Ellis; Helen Huang; Michael Chen; Ross A. Lippert; Bjarni V. Halldórsson; Beth Woodworth; Thomas J. White; Andrew G. Clark; Fritz F. Parl; Samuel Broder; Michael Dean; Kenneth Offit

Nearly one in eight US women will develop breast cancer in their lifetime. Most breast cancer is not associated with a hereditary syndrome, occurs in postmenopausal women, and is estrogen and progesterone receptor-positive. Estrogen exposure is an epidemiologic risk factor for breast cancer and estrogen is a potent mammary mitogen. We studied single nucleotide polymorphisms (SNPs) in estrogen receptors in 615 healthy subjects and 1011 individuals with histologically confirmed breast cancer, all from New York City. We analyzed 13 SNPs in the progesterone receptor gene (PGR), 17 SNPs in estrogen receptor 1 gene (ESR1), and 8 SNPs in the estrogen receptor 2 gene (ESR2). We observed three common haplotypes in ESR1 that were associated with a decreased risk for breast cancer [odds ratio (OR), ∼ O.4; 95% confidence interval (CI), 0.2–0.8; P < 0.01]. Another haplotype was associated with an increased risk of breast cancer (OR, 2.1; 95% CI, 1.2–3.8; P < 0.05). A unique risk haplotype was present in ∼7% of older Ashkenazi Jewish study subjects (OR, 1.7; 95% CI, 1.2–2.4; P < 0.003). We narrowed the ESR1 risk haplotypes to the promoter region and first exon. We define several other haplotypes in Ashkenazi Jews in both ESR1 and ESR2 that may elevate susceptibility to breast cancer. In contrast, we found no association between any PGR variant or haplotype and breast cancer. Genetic epidemiology study replication and functional assays of the haplotypes should permit a better understanding of the role of steroid receptor genetic variants and breast cancer risk.


research in computational molecular biology | 2002

A Survey of Computational Methods for Determining Haplotypes

Bjarni V. Halldórsson; Vineet Bafna; Nathan Edwards; Ross A. Lippert; Shibu Yooseph; Sorin Istrail

It is widely anticipated that the study of variation in the human genome will provide a means of predicting risk of a variety of complex diseases. Single nucleotide polymorphisms (SNPs) are the most common form of genomic variation. Haplotypes have been suggested as one means for reducing the complexity of studying SNPs. In this paper we review some of the computational approaches that have been taking for determining haplotypes and suggest new approaches.


Proceedings of the National Academy of Sciences of the United States of America | 2002

Distributional regimes for the number of k-word matches between two random sequences

Ross A. Lippert; Haiyan Huang; Michael S. Waterman

When comparing two sequences, a natural approach is to count the number of k-letter words the two sequences have in common. No positional information is used in the count, but it has the virtue that the comparison time is linear with sequence length. For this reason this statistic D2 and certain transformations of D2 are used for EST sequence database searches. In this paper we begin the rigorous study of the statistical distribution of D2. Using an independence model of DNA sequences, we derive limiting distributions by means of the Stein and Chen–Stein methods and identify three asymptotic regimes, including compound Poisson and normal. The compound Poisson distribution arises when the word size k is large and word matches are rare. The normal distribution arises when the word size is small and matches are common. Explicit expressions for what is meant by large and small word sizes are given in the paper. However, when word size is small and the letters are uniformly distributed, the anticipated limiting normal distribution does not always occur. In this situation the uniform distribution provides the exception to other letter distributions. Therefore a naive, one distribution fits all, approach to D2 statistics could easily create serious errors in estimating significance.


Sigkdd Explorations | 2002

Rule-based extraction of experimental evidence in the biomedical domain: the KDD Cup 2002 (task 1)

Yizhar Regev; Michal Finkelstein-Landau; Ronen Feldman; Maya Gorodetsky; Xin Zheng; Samuel Levy; Rosane Charlab; Charles Lawrence; Ross A. Lippert; Qing Zhang; Hagit Shatkay

Below we describe the winning system that we built for the KDD Cup 2002 Task 1 competition. Our system is a Rule-based Information Extraction (IE) system. It combines pattern matching, Natural Language Processing (NLP) tools, semantic constraints based on the domain and the specific task, and a post-processing stage for making the final curation decision based on the various evidence (positive and negative) found within the document. Development and implementation were made using the DIAL IE language and the ClearLab development environment. The results achieved were significantly superior than those achieved using categorization approaches.


Journal of Computational Biology | 2005

Space-Efficient Whole Genome Comparisons with Burrows–Wheeler Transforms

Ross A. Lippert

The starting point for any alignment of mammalian genomes is the computation of exact matches satisfying various criteria. Time-efficient, O(n), data structures for this computation, such as the suffix tree, require O(n log(n)) space, several times the space of the genomes themselves. Thus, any reasonable whole-genome comparative project finds itself requiring tens of Gigabytes of RAM to maintain time-efficiency. This is beyond most modern workstations. With a new data structure, the compressed suffix array (CSA) implemented via the Burrows-Wheeler transform, we can trade time-efficiency for space-efficiency, taking O(n log(n)) time, but running in O(n) space, typically in total space less than or equal to that of the genomes themselves. If space is more expensive than time, this is an appropriate approach to consider. The most space-efficient implementation of this data structure requires 5 bits per nucleotide character to build on-line, in the worst case, and 2.5 bits per character to store once built. We present a description of this data structure and how it is used to obtain matches. An implementation (called bbbwt) is demonstrated by aligning two mammalian genomes on a modest workstation equipped with under 2 GB of free RAM in time superior to that of the implementations of other data structures.


Journal of Computational Physics | 1998

Multiscale Computation with Interpolating Wavelets

Ross A. Lippert; T. A. Arias; Alan Edelman

Multiresolution analyses based upon interpolets, interpolating scaling functions introduced by Deslauriers and Dubuc, are particularly well-suited to physical applications because they allowexactrecovery of the multiresolution representation of a function from its sample values on afiniteset of points in space. We present a detailed study of the application of wavelet concepts to physical problems expressed in such bases. The manuscript describes algorithms for the associated transforms which for properly constructed grids of variable resolution compute correctly without having to introduce extra grid points. We demonstrate that for the application of local homogeneous operators in such bases, the nonstandard multiply of Beylkin, Coifman, and Rokhlin also proceeds exactly for inhomogeneous grids of appropriate form. To obtain less stringent conditions on the grids, we generalize the nonstandard multiply so that communication may proceed between nonadjacent levels. The manuscript concludes with timing comparisons against nai#x0308;ve algorithms and an illustration of the scale-independence of the convergence rate of the conjugate gradient solution of Poissons equation using a simple preconditioning.


Journal of Computational Biology | 2005

A Space-Efficient Construction of the Burrows–Wheeler Transform for Genomic Data

Ross A. Lippert; Clark M. Mobarry; Brian Walenz

Algorithms for exact string matching have substantial application in computational biology. Time-efficient data structures which support a variety of exact string matching queries, such as the suffix tree and the suffix array, have been applied to such problems. As sequence databases grow, more space-efficient approaches to exact matching are becoming more important. One such data structure, the compressed suffix array (CSA), based on the Burrows-Wheeler transform, has been shown to require memory which is nearly equal to the memory requirements of the original database, while supporting common sorts of query problems time efficiently. However, building a CSA from a sequence in efficient space and time is challenging. In 2002, the first space-efficient CSA construction algorithm was presented. That implementation used (1+2 log2 |summation|)(1+epsilon) bits per character (where epsilon is a small fraction). The construction algorithm ran in as much as twice that space, in O(| summation|n log(n)) time. We have created an implementation which can also achieve these asymptotic bounds, but for small alphabets, and only uses 1/2 (1+|summation|)(1+epsilon) bits per character, a factor of 2 less space for nucleotide alphabets. We present time and space results for the CSA construction and querying of our implementation on publicly available genome data which demonstrate the practicality of this approach.


Discrete Mathematics & Theoretical Computer Science | 2003

Combinatorial problems arising in SNP and haplotype analysis

Bjarni V. Halldórsson; Vineet Bafna; Nathan Edwards; Ross A. Lippert; Shibu Yooseph; Sorin Istrail

It is widely anticipated that the study of variation in the human genome will provide a means of predicting risk of a variety of complex diseases. This paper presents a number of algorithmic and combinatorial problems that arise when studying a very common form of genomic variation, single nucleotide polymorphisms (SNPs). We review recent results and present challenging open problems.


research in computational molecular biology | 2004

Finding anchors for genomic sequence comparison

Ross A. Lippert; Xiaoyue Zhao; Liliana Florea; Clark M. Mobarry; Sorin Istrail

Recent sequencing of the human and other mammalian genomes has brought about the necessity to align them, to identify and characterize their commonalities and differences. Programs that align whole genomes generally use a seed-and-extend technique, starting from exact or near-exact matches and selecting a reliable subset of these, called anchors, and then filling in the remaining portions between the anchors using a combination of local and global alignment algorithms, but their choices for the parameters so far have been primarily heuristic. We present a statistical framework and practical methods for selecting a set of matches that is both sensitive and specific and can constitute a reliable set of anchors for a one-to-one mapping of two genomes from which a whole-genome alignment can be built. Starting from exact matches, we introduce a novel per-base repeat annotation, the

Collaboration


Dive into the Ross A. Lippert's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Russell Schwartz

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ryan Rifkin

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alan Edelman

Massachusetts Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge