Robert L. Strausberg | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Robert L. Strausberg is active.

Explore More

Publication

Featured researches published by Robert L. Strausberg.

Science | 2008

An Integrated Genomic Analysis of Human Glioblastoma Multiforme

D. Williams Parsons; Siân Jones; Xiaosong Zhang; Jimmy Lin; Rebecca J. Leary; Philipp Angenendt; Parminder Mankoo; Hannah Carter; I-Mei Siu; Gary L. Gallia; Alessandro Olivi; Roger E. McLendon; B. Ahmed Rasheed; Stephen T. Keir; Tatiana Nikolskaya; Yuri Nikolsky; Dana Busam; Hanna Tekleab; Luis A. Diaz; James Hartigan; Doug Smith; Robert L. Strausberg; Suely Kazue Nagahashi Marie; Sueli Mieko Oba Shinjo; Hai Yan; Gregory J. Riggins; Darell D. Bigner; Rachel Karchin; Nick Papadopoulos; Giovanni Parmigiani

Glioblastoma multiforme (GBM) is the most common and lethal type of brain cancer. To identify the genetic alterations in GBMs, we sequenced 20,661 protein coding genes, determined the presence of amplifications and deletions using high-density oligonucleotide arrays, and performed gene expression analyses using next-generation sequencing technologies in 22 human tumor samples. This comprehensive analysis led to the discovery of a variety of genes that were not known to be altered in GBMs. Most notably, we found recurrent mutations in the active site of isocitrate dehydrogenase 1 (IDH1) in 12% of GBM patients. Mutations in IDH1 occurred in a large fraction of young patients and in most patients with secondary GBMs and were associated with an increase in overall survival. These studies demonstrate the value of unbiased genomic analyses in the characterization of human brain cancer and identify a potentially useful genetic alteration for the classification and targeted therapy of GBMs.

Proceedings of the National Academy of Sciences of the United States of America | 2002

Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences.

Robert L. Strausberg; Elise A. Feingold; Lynette H. Grouse; Jeffery G. Derge; Richard D. Klausner; Francis S. Collins; Lukas Wagner; Carolyn M. Shenmen; Gregory D. Schuler; Stephen F. Altschul; Barry R. Zeeberg; Kenneth H. Buetow; Carl F. Schaefer; Narayan K. Bhat; Ralph F. Hopkins; Heather Jordan; Troy Moore; Steve I. Max; Jun Wang; Florence Hsieh; Luda Diatchenko; Kate Marusina; Andrew A. Farmer; Gerald M. Rubin; Ling Hong; Mark Stapleton; M. Bento Soares; Maria F. Bonaldo; Tom L. Casavant; Todd E. Scheetz

The National Institutes of Health Mammalian Gene Collection (MGC) Program is a multiinstitutional effort to identify and sequence a cDNA clone containing a complete ORF for each human and mouse gene. ESTs were generated from libraries enriched for full-length cDNAs and analyzed to identify candidate full-ORF clones, which then were sequenced to high accuracy. The MGC has currently sequenced and verified the full ORF for a nonredundant set of >9,000 human and >6,000 mouse genes. Candidate full-ORF clones for an additional 7,800 human and 3,500 mouse genes also have been identified. All MGC sequences and clones are available without restriction through public databases and clone distribution networks (see http://mgc.nci.nih.gov).

PLOS Biology | 2007

The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific

Douglas B. Rusch; Aaron L. Halpern; Granger Sutton; Karla B. Heidelberg; Shannon J. Williamson; Shibu Yooseph; Dongying Wu; Jonathan A. Eisen; Jeff Hoffman; Karin A. Remington; Karen Beeson; Bao Duc Tran; Hamilton O. Smith; Holly Baden-Tillson; Clare Stewart; Joyce Thorpe; Jason Freeman; Cynthia Andrews-Pfannkoch; Joseph E. Venter; Kelvin Li; Saul Kravitz; John F. Heidelberg; Terry Utterback; Yu-Hui Rogers; Luisa I. Falcón; Valeria Souza; Germán Bonilla-Rosso; Luis E. Eguiarte; David M. Karl; Shubha Sathyendranath

The worlds oceans contain a complex mixture of micro-organisms that are for the most part, uncharacterized both genetically and biochemically. We report here a metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition. These samples, collected across a several-thousand km transect from the North Atlantic through the Panama Canal and ending in the South Pacific yielded an extensive dataset consisting of 7.7 million sequencing reads (6.3 billion bp). Though a few major microbial clades dominate the planktonic marine niche, the dataset contains great diversity with 85% of the assembled sequence and 57% of the unassembled data being unique at a 98% sequence identity cutoff. Using the metadata associated with each sample and sequencing library, we developed new comparative genomic and assembly methods. One comparative genomic method, termed “fragment recruitment,” addressed questions of genome structure, evolution, and taxonomic or phylogenetic diversity, as well as the biochemical diversity of genes and gene families. A second method, termed “extreme assembly,” made possible the assembly and reconstruction of large segments of abundant but clearly nonclonal organisms. Within all abundant populations analyzed, we found extensive intra-ribotype diversity in several forms: (1) extensive sequence variation within orthologous regions throughout a given genome; despite coverage of individual ribotypes approaching 500-fold, most individual sequencing reads are unique; (2) numerous changes in gene content some with direct adaptive implications; and (3) hypervariable genomic islands that are too variable to assemble. The intra-ribotype diversity is organized into genetically isolated populations that have overlapping but independent distributions, implying distinct environmental preference. We present novel methods for measuring the genomic similarity between metagenomic samples and show how they may be grouped into several community types. Specific functional adaptations can be identified both within individual ribotypes and across the entire community, including proteorhodopsin spectral tuning and the presence or absence of the phosphate-binding gene PstS.

PLOS Biology | 2007

The Diploid Genome Sequence of an Individual Human

Samuel Levy; Granger Sutton; Pauline C. Ng; Lars Feuk; Aaron L. Halpern; Brian Walenz; Nelson Axelrod; Jiaqi Huang; Ewen F. Kirkness; Gennady Denisov; Yuan Lin; Jeffrey R. MacDonald; Andy Wing Chun Pang; Mary Shago; Timothy B. Stockwell; Alexia Tsiamouri; Vineet Bafna; Vikas Bansal; Saul Kravitz; Dana Busam; Karen Beeson; Tina McIntosh; Karin A. Remington; Josep F. Abril; John Gill; Jon Borman; Yu-Hui Rogers; Marvin Frazier; Stephen W. Scherer; Robert L. Strausberg

Presented here is a genome sequence of an individual human. It was produced from ∼32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2–206 bp), 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), 559,473 homozygous indels (1–82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.

PLOS Biology | 2007

The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families

Shibu Yooseph; Granger Sutton; Douglas B. Rusch; Aaron L. Halpern; Shannon J. Williamson; Karin A. Remington; Jonathan A. Eisen; Karla B. Heidelberg; Gerard Manning; Weizhong Li; Lukasz Jaroszewski; Piotr Cieplak; Christopher S. Miller; Huiying Li; Susan T. Mashiyama; Marcin P Joachimiak; Christopher van Belle; John-Marc Chandonia; David A W Soergel; Yufeng Zhai; Kannan Natarajan; Shaun W. Lee; Benjamin J. Raphael; Vineet Bafna; Robert Friedman; Steven E. Brenner; Adam Godzik; David Eisenberg; Jack E. Dixon; Susan S. Taylor

Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.

Nature | 2010

The dynamic genome of Hydra

Jarrod Chapman; Ewen F. Kirkness; Oleg Simakov; Steven E. Hampson; Therese Mitros; Therese Weinmaier; Thomas Rattei; Prakash G. Balasubramanian; Jon Borman; Dana Busam; Kathryn Disbennett; Cynthia Pfannkoch; Nadezhda Sumin; Granger Sutton; Lakshmi Viswanathan; Brian Walenz; David Goodstein; Uffe Hellsten; Takeshi Kawashima; Simon Prochnik; Nicholas H. Putnam; Shengquiang Shu; Bruce Blumberg; Catherine E. Dana; Lydia Gee; Dennis F. Kibler; Lee Law; Dirk Lindgens; Daniel E. Martínez; Jisong Peng

The freshwater cnidarian Hydra was first described in 1702 and has been the object of study for 300 years. Experimental studies of Hydra between 1736 and 1744 culminated in the discovery of asexual reproduction of an animal by budding, the first description of regeneration in an animal, and successful transplantation of tissue between animals. Today, Hydra is an important model for studies of axial patterning, stem cell biology and regeneration. Here we report the genome of Hydra magnipapillata and compare it to the genomes of the anthozoan Nematostella vectensis and other animals. The Hydra genome has been shaped by bursts of transposable element expansion, horizontal gene transfer, trans-splicing, and simplification of gene structure and gene content that parallel simplification of the Hydra life cycle. We also report the sequence of the genome of a novel bacterium stably associated with H. magnipapillata. Comparisons of the Hydra genome to the genomes of other animals shed light on the evolution of epithelia, contractile tissues, developmentally regulated transcription factors, the Spemann–Mangold organizer, pluripotency genes and the neuromuscular junction.

Science | 2010

A catalog of reference genomes from the human microbiome.

Karen E. Nelson; George M. Weinstock; Sarah K. Highlander; Kim C. Worley; Heather Huot Creasy; Jennifer R. Wortman; Douglas B. Rusch; Makedonka Mitreva; Erica Sodergren; Asif T. Chinwalla; Michael Feldgarden; Dirk Gevers; Brian J. Haas; Ramana Madupu; Doyle V. Ward; Bruce Birren; Richard A. Gibbs; Barbara A. Methé; Joseph F. Petrosino; Robert L. Strausberg; Granger Sutton; Owen White; Richard Wilson; Scott Durkin; Michelle G. Giglio; Sharvari Gujja; Clint Howarth; Chinnappa D. Kodira; Nikos C. Kyrpides; Teena Mehta

News from the Inner Tube of Life A major initiative by the U.S. National Institutes of Health to sequence 900 genomes of microorganisms that live on the surfaces and orifices of the human body has established standardized protocols and methods for such large-scale reference sequencing. By combining previously accumulated data with new data, Nelson et al. (p. 994) present an initial analysis of 178 bacterial genomes. The sampling so far barely scratches the surface of the microbial diversity found on humans, but the work provides an important baseline for future analyses. Standardized protocols and methods are being established for large-scale sequencing of the microorganisms living on humans. The human microbiome refers to the community of microorganisms, including prokaryotes, viruses, and microbial eukaryotes, that populate the human body. The National Institutes of Health launched an initiative that focuses on describing the diversity of microbial species that are associated with health and disease. The first phase of this initiative includes the sequencing of hundreds of microbial reference genomes, coupled to metagenomic sequencing from multiple body sites. Here we present results from an initial reference genome sequencing of 178 microbial genomes. From 547,968 predicted polypeptides that correspond to the gene complement of these strains, previously unidentified (“novel”) polypeptides that had both unmasked sequence length greater than 100 amino acids and no BLASTP match to any nonreference entry in the nonredundant subset were defined. This analysis resulted in a set of 30,867 polypeptides, of which 29,987 (~97%) were unique. In addition, this set of microbial genomes allows for ~40% of random sequences from the microbiome of the gastrointestinal tract to be associated with organisms based on the match criteria used. Insights into pan-genome analysis suggest that we are still far from saturating microbial species genetic data sets. In addition, the associated metrics and standards used by our group for quality assurance are presented.

Science | 2009

Genome Project Standards in a New Era of Sequencing

Patrick Chain; Darren Grafham; Robert S. Fulton; Michael Fitzgerald; Jessica B. Hostetler; Donna M. Muzny; J. Ali; Bruce W. Birren; David Bruce; Christian Buhay; James R. Cole; Yan Ding; Shannon Dugan; Dawn Field; George M Garrity; Richard A. Gibbs; Tina Graves; Cliff Han; Scott H. Harrison; Sarah K. Highlander; Philip Hugenholtz; H. M. Khouri; Chinnappa D. Kodira; Eugene Kolker; Nikos C. Kyrpides; D. Lang; Alla Lapidus; S. A. Malfatti; Victor Markowitz; T. Metha

More detailed sequence standards that keep up with revolutionary sequencing technologies will aid the research community in evaluating data. For over a decade, genome sequences have adhered to only two standards that are relied on for purposes of sequence analysis by interested third parties (1, 2). However, ongoing developments in revolutionary sequencing technologies have resulted in a redefinition of traditional whole-genome sequencing that requires reevaluation of such standards. With commercially available 454 pyrosequencing (followed by Illumina, SOLiD, and now Helicos), there has been an explosion of genomes sequenced under the moniker “draft”; however, these can be very poor quality genomes (due to inherent errors in the sequencing technologies, and the inability of assembly programs to fully address these errors). Further, one can only infer that such draft genomes may be of poor quality by navigating through the databases to find the number and type of reads deposited in sequence trace repositories (and not all genomes have this available), or to identify the number of contigs or genome fragments deposited to the database. The difficulty in assessing the quality of such deposited genomes has created some havoc for genome analysis pipelines and has contributed to many wasted hours. Exponential leaps in raw sequencing capability and greatly reduced prices have further skewed the time- and cost-ratios of draft data generation versus the painstaking process of improving and finishing a genome. The result is an ever-widening gap between drafted and finished genomes that only promises to continue (see the figure, page 236); hence, there is an urgent need to distinguish good from poor data sets.

Proceedings of the National Academy of Sciences of the United States of America | 2010

Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyle

Ewen F. Kirkness; Brian J. Haas; Weilin Sun; Henk R. Braig; M. Alejandra Perotti; John M. Clark; Si Hyeock Lee; Hugh M. Robertson; Ryan C. Kennedy; Eran Elhaik; Daniel Gerlach; Evgenia V. Kriventseva; Christine G. Elsik; Dan Graur; Catherine A. Hill; Jan A. Veenstra; Brian Walenz; Jose M. C. Tubio; José M. C. Ribeiro; Julio Rozas; J. Spencer Johnston; Justin T. Reese; Aleksandar Popadić; Marta Tojo; Didier Raoult; David L. Reed; Yoshinori Tomoyasu; Emily Kraus; Omprakash Mittapalli; Venu M. Margam

As an obligatory parasite of humans, the body louse (Pediculus humanus humanus) is an important vector for human diseases, including epidemic typhus, relapsing fever, and trench fever. Here, we present genome sequences of the body louse and its primary bacterial endosymbiont Candidatus Riesia pediculicola. The body louse has the smallest known insect genome, spanning 108 Mb. Despite its status as an obligate parasite, it retains a remarkably complete basal insect repertoire of 10,773 protein-coding genes and 57 microRNAs. Representing hemimetabolous insects, the genome of the body louse thus provides a reference for studies of holometabolous insects. Compared with other insect genomes, the body louse genome contains significantly fewer genes associated with environmental sensing and response, including odorant and gustatory receptors and detoxifying enzymes. The unique architecture of the 18 minicircular mitochondrial chromosomes of the body louse may be linked to the loss of the gene encoding the mitochondrial single-stranded DNA binding protein. The genome of the obligatory louse endosymbiont Candidatus Riesia pediculicola encodes less than 600 genes on a short, linear chromosome and a circular plasmid. The plasmid harbors a unique arrangement of genes required for the synthesis of pantothenate, an essential vitamin deficient in the louse diet. The human body louse, its primary endosymbiont, and the bacterial pathogens that it vectors all possess genomes reduced in size compared with their free-living close relatives. Thus, the body louse genome project offers unique information and tools to use in advancing understanding of coevolution among vectors, symbionts, and pathogens.

Science | 2010

Sequencing of Culex quinquefasciatus establishes a platform for mosquito comparative genomics.

Peter Arensburger; Karine Megy; Robert M. Waterhouse; Jenica Abrudan; Paolo Amedeo; Beatriz García Antelo; Lyric C. Bartholomay; Shelby Bidwell; Elisabet Caler; Francisco Camara; Corey L. Campbell; Kathryn S. Campbell; Claudio Casola; Marta T. Castro; Ishwar Chandramouliswaran; Sinéad B. Chapman; Scott Christley; Javier Costas; Eric Eisenstadt; Cédric Feschotte; Claire M. Fraser-Liggett; Roderic Guigó; Brian J. Haas; Martin Hammond; Bill S. Hansson; Janet Hemingway; Sharon R. Hill; Clint Howarth; Rickard Ignell; Ryan C. Kennedy

Closing the Vector Circle The genome sequence of Culex quinquefasciatus offers a representative of the third major genus of mosquito disease vectors for comparative analysis. In a major international effort, Arensburger et al. (p. 86) uncovered divergences in the C. quinquefasciatus genome compared with the representatives of the other two genera Aedes aegypti and Anopheles gambiae. The main difference noted is the expansion of numbers of genes, particularly for immunity, oxidoreductive functions, and digestive enzymes, which may reflect specific aspects of the Culex life cycle. Bartholomay et al. (p. 88) explored infection-response genes in Culex in more depth and uncovered 500 immune response-related genes, similar to the numbers seen in Aedes, but fewer than seen in Anopheles or the fruit fly Drosophila melanogaster. The higher numbers of genes were attributed partly to expansions in those encoding serpins, C-type lectins, and fibrinogen-related proteins, consistent with greater immune surveillance and associated signaling needed to monitor the dangers of breeding in polluted, urbanized environments. Transcriptome analysis confirmed that inoculation with unfamiliar bacteria prompted strong immune responses in Culex. The worm and virus pathogens that the mosquitoes transmit naturally provoked little immune activation, however, suggesting that tolerance has evolved to any damage caused by replication of the pathogens in the insects. The genome of a third mosquito species reveals distinctions related to vector capacities and habitat preferences. Culex quinquefasciatus (the southern house mosquito) is an important mosquito vector of viruses such as West Nile virus and St. Louis encephalitis virus, as well as of nematodes that cause lymphatic filariasis. C. quinquefasciatus is one species within the Culex pipiens species complex and can be found throughout tropical and temperate climates of the world. The ability of C. quinquefasciatus to take blood meals from birds, livestock, and humans contributes to its ability to vector pathogens between species. Here, we describe the genomic sequence of C. quinquefasciatus: Its repertoire of 18,883 protein-coding genes is 22% larger than that of Aedes aegypti and 52% larger than that of Anopheles gambiae with multiple gene-family expansions, including olfactory and gustatory receptors, salivary gland genes, and genes associated with xenobiotic detoxification.

Explore More