Is this you? Create Your Porfile

Mark Stapleton

Lawrence Berkeley National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mark Stapleton is active.

Explore More

Publication

Featured researches published by Mark Stapleton.

Proceedings of the National Academy of Sciences of the United States of America | 2002

Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences.

Robert L. Strausberg; Elise A. Feingold; Lynette H. Grouse; Jeffery G. Derge; Richard D. Klausner; Francis S. Collins; Lukas Wagner; Carolyn M. Shenmen; Gregory D. Schuler; Stephen F. Altschul; Barry R. Zeeberg; Kenneth H. Buetow; Carl F. Schaefer; Narayan K. Bhat; Ralph F. Hopkins; Heather Jordan; Troy Moore; Steve I. Max; Jun Wang; Florence Hsieh; Luda Diatchenko; Kate Marusina; Andrew A. Farmer; Gerald M. Rubin; Ling Hong; Mark Stapleton; M. Bento Soares; Maria F. Bonaldo; Tom L. Casavant; Todd E. Scheetz

The National Institutes of Health Mammalian Gene Collection (MGC) Program is a multiinstitutional effort to identify and sequence a cDNA clone containing a complete ORF for each human and mouse gene. ESTs were generated from libraries enriched for full-length cDNAs and analyzed to identify candidate full-ORF clones, which then were sequenced to high accuracy. The MGC has currently sequenced and verified the full ORF for a nonredundant set of >9,000 human and >6,000 mouse genes. Candidate full-ORF clones for an additional 7,800 human and 3,500 mouse genes also have been identified. All MGC sequences and clones are available without restriction through public databases and clone distribution networks (see http://mgc.nci.nih.gov).

Cell | 2011

A Protein Complex Network of Drosophila melanogaster

K. G. Guruharsha; Jean François Rual; Bo Zhai; Julian Mintseris; Pujita Vaidya; Namita Vaidya; Chapman Beekman; Christina Y. Wong; David Y. Rhee; Odise Cenaj; Emily McKillip; Saumini Shah; Mark Stapleton; Kenneth H. Wan; Charles Yu; Bayan Parsa; Joseph W. Carlson; Xiao Chen; Bhaveen Kapadia; K. VijayRaghavan; Steven P. Gygi; Susan E. Celniker; Robert A. Obar; Spyros Artavanis-Tsakonas

Determining the composition of protein complexes is an essential step toward understanding the cell as an integrated system. Using coaffinity purification coupled to mass spectrometry analysis, we examined protein associations involving nearly 5,000 individual, FLAG-HA epitope-tagged Drosophila proteins. Stringent analysis of these data, based on a statistical framework designed to define individual protein-protein interactions, led to the generation of a Drosophila protein interaction map (DPiM) encompassing 556 protein complexes. The high quality of the DPiM and its usefulness as a paradigm for metazoan proteomes are apparent from the recovery of many known complexes, significant enrichment for shared functional attributes, and validation in human cells. The DPiM defines potential novel members for several important protein complexes and assigns functional links to 586 protein-coding genes lacking previous experimental annotation. The DPiM represents, to our knowledge, the largest metazoan protein complex map and provides a valuable resource for analysis of protein complex evolution.

Genome Biology | 2002

Finishing a whole-genome shotgun: Release 3 of the Drosophila melanogaster euchromatic genome sequence

Susan E. Celniker; David A. Wheeler; Brent Kronmiller; Joseph W. Carlson; Aaron L. Halpern; Sandeep Patel; Mark D. Adams; Mark Champe; Shannon Dugan; Erwin Frise; Ann Hodgson; Reed A. George; Roger A. Hoskins; Todd R. Laverty; Donna M. Muzny; Catherine R. Nelson; Joanne Pacleb; Soo Park; Barret D. Pfeiffer; Stephen Richards; Erica Sodergren; Robert Svirskas; Paul E. Tabor; Kenneth H. Wan; Mark Stapleton; Granger Sutton; Craig Venter; George M. Weinstock; Steven E. Scherer; Eugene W. Myers

BackgroundThe Drosophila melanogaster genome was the first metazoan genome to have been sequenced by the whole-genome shotgun (WGS) method. Two issues relating to this achievement were widely debated in the genomics community: how correct is the sequence with respect to base-pair (bp) accuracy and frequency of assembly errors? And, how difficult is it to bring a WGS sequence to the accepted standard for finished sequence? We are now in a position to answer these questions.ResultsOur finishing process was designed to close gaps, improve sequence quality and validate the assembly. Sequence traces derived from the WGS and draft sequencing of individual bacterial artificial chromosomes (BACs) were assembled into BAC-sized segments. These segments were brought to high quality, and then joined to constitute the sequence of each chromosome arm. Overall assembly was verified by comparison to a physical map of fingerprinted BAC clones. In the current version of the 116.9 Mb euchromatic genome, called Release 3, the six euchromatic chromosome arms are represented by 13 scaffolds with a total of 37 sequence gaps. We compared Release 3 to Release 2; in autosomal regions of unique sequence, the error rate of Release 2 was one in 20,000 bp.ConclusionsThe WGS strategy can efficiently produce a high-quality sequence of a metazoan genome while generating the reagents required for sequence finishing. However, the initial method of repeat assembly was flawed. The sequence we report here, Release 3, is a reliable resource for molecular genetic experimentation and computational analysis.

Genome Biology | 2002

Annotation of the Drosophila melanogaster euchromatic genome: a systematic review

Sima Misra; Madeline A. Crosby; Christopher J. Mungall; Beverley B. Matthews; Kathryn S. Campbell; Pavel Hradecky; Yanmei Huang; Joshua S Kaminker; Gillian Millburn; Simon E Prochnik; Christopher D. Smith; Jonathan L Tupy; Eleanor J Whitfield; Leyla Bayraktaroglu; Benjamin P. Berman; Brian Bettencourt; Susan E. Celniker; Aubrey D.N.J. de Grey; Rachel Drysdale; Nomi L. Harris; John Richter; Susan Russo; Andrew J. Schroeder; ShengQiang Shu; Mark Stapleton; Chihiro Yamada; Michael Ashburner; William M. Gelbart; Gerald M. Rubin; Suzanna E. Lewis

BackgroundThe recent completion of the Drosophila melanogaster genomic sequence to high quality and the availability of a greatly expanded set of Drosophila cDNA sequences, aligning to 78% of the predicted euchromatic genes, afforded FlyBase the opportunity to significantly improve genomic annotations. We made the annotation process more rigorous by inspecting each gene visually, utilizing a comprehensive set of curation rules, requiring traceable evidence for each gene model, and comparing each predicted peptide to SWISS-PROT and TrEMBL sequences.ResultsAlthough the number of predicted protein-coding genes in Drosophila remains essentially unchanged, the revised annotation significantly improves gene models, resulting in structural changes to 85% of the transcripts and 45% of the predicted proteins. We annotated transposable elements and non-protein-coding RNAs as new features, and extended the annotation of untranslated (UTR) sequences and alternative transcripts to include more than 70% and 20% of genes, respectively. Finally, cDNA sequence provided evidence for dicistronic transcripts, neighboring genes with overlapping UTRs on the same DNA sequence strand, alternatively spliced genes that encode distinct, non-overlapping peptides, and numerous nested genes.ConclusionsIdentification of so many unusual gene models not only suggests that some mechanisms for gene regulation are more prevalent than previously believed, but also underscores the complex challenges of eukaryotic gene prediction. At present, experimental data and human curation remain essential to generate high-quality genome annotations.

Genome Biology | 2009

Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions

Stewart MacArthur; Xiao-Yong Li; Jingyi Li; James B. Brown; Hou Cheng Chu; Lucy Zeng; Brandi P. Grondona; Aaron Hechmer; Lisa Simirenko; Soile V.E. Keranen; David W. Knowles; Mark Stapleton; Peter J. Bickel; Mark D. Biggin; Michael B. Eisen

BackgroundWe previously established that six sequence-specific transcription factors that initiate anterior/posterior patterning in Drosophila bind to overlapping sets of thousands of genomic regions in blastoderm embryos. While regions bound at high levels include known and probable functional targets, more poorly bound regions are preferentially associated with housekeeping genes and/or genes not transcribed in the blastoderm, and are frequently found in protein coding sequences or in less conserved non-coding DNA, suggesting that many are likely non-functional.ResultsHere we show that an additional 15 transcription factors that regulate other aspects of embryo patterning show a similar quantitative continuum of function and binding to thousands of genomic regions in vivo. Collectively, the 21 regulators show a surprisingly high overlap in the regions they bind given that they belong to 11 DNA binding domain families, specify distinct developmental fates, and can act via different cis-regulatory modules. We demonstrate, however, that quantitative differences in relative levels of binding to shared targets correlate with the known biological and transcriptional regulatory specificities of these factors.ConclusionsIt is likely that the overlap in binding of biochemically and functionally unrelated transcription factors arises from the high concentrations of these proteins in nuclei, which, coupled with their broad DNA binding specificities, directs them to regions of open chromatin. We suggest that most animal transcription factors will be found to show a similar broad overlapping pattern of binding in vivo, with specificity achieved by modulating the amount, rather than the identity, of bound factor.

Genome Biology | 2002

A Drosophila full-length cDNA resource

Mark Stapleton; Joe Carlson; Peter Brokstein; Charles Yu; Mark Champe; Reed A. George; Hannibal Guarin; Brent Kronmiller; Joanne Pacleb; Soo Park; Ken Wan; Gerald M. Rubin; Susan E. Celniker

BackgroundA collection of sequenced full-length cDNAs is an important resource both for functional genomics studies and for the determination of the intron-exon structure of genes. Providing this resource to the Drosophila melanogaster research community has been a long-term goal of the Berkeley Drosophila Genome Project. We have previously described the Drosophila Gene Collection (DGC), a set of putative full-length cDNAs that was produced by generating and analyzing over 250,000 expressed sequence tags (ESTs) derived from a variety of tissues and developmental stages.ResultsWe have generated high-quality full-insert sequence for 8,921 clones in the DGC. We compared the sequence of these clones to the annotated Release 3 genomic sequence, and identified more than 5,300 cDNAs that contain a complete and accurate protein-coding sequence. This corresponds to at least one splice form for 40% of the predicted D. melanogaster genes. We also identified potential new cases of RNA editing.ConclusionsWe show that comparison of cDNA sequences to a high-quality annotated genomic sequence is an effective approach to identifying and eliminating defective clones from a cDNA collection and ensure its utility for experimentation. Clones were eliminated either because they carry single nucleotide discrepancies, which most probably result from reverse transcriptase errors, or because they are truncated and contain only part of the protein-coding sequence.

Genome Biology | 2002

Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome

Casey M. Bergman; Barret D. Pfeiffer; Diego E. Rincon-Limas; Roger A. Hoskins; Andreas Gnirke; Chris Mungall; Adrienne M. Wang; Brent Kronmiller; Joanne Pacleb; Soo Park; Mark Stapleton; Kenneth H. Wan; Reed A. George; Pieter J. de Jong; Juan Botas; Gerald M. Rubin; Susan E. Celniker

BackgroundIt is widely accepted that comparative sequence data can aid the functional annotation of genome sequences; however, the most informative species and features of genome evolution for comparison remain to be determined.ResultsWe analyzed conservation in eight genomic regions (apterous, even-skipped, fushi tarazu, twist, and Rhodopsins 1, 2, 3 and 4) from four Drosophila species (D. erecta, D. pseudoobscura, D. willistoni, and D. littoralis) covering more than 500 kb of the D. melanogaster genome. All D. melanogaster genes (and 78-82% of coding exons) identified in divergent species such as D. pseudoobscura show evidence of functional constraint. Addition of a third species can reveal functional constraint in otherwise non-significant pairwise exon comparisons. Microsynteny is largely conserved, with rearrangement breakpoints, novel transposable element insertions, and gene transpositions occurring in similar numbers. Rates of amino-acid substitution are higher in uncharacterized genes relative to genes that have previously been studied. Conserved non-coding sequences (CNCSs) tend to be spatially clustered with conserved spacing between CNCSs, and clusters of CNCSs can be used to predict enhancer sequences.ConclusionsOur results provide the basis for choosing species whose genome sequences would be most useful in aiding the functional annotation of coding and cis-regulatory sequences in Drosophila. Furthermore, this work shows how decoding the spatial organization of conserved sequences, such as the clustering of CNCSs, can complement efforts to annotate eukaryotic genomes on the basis of sequence conservation alone.

Genome Biology | 2002

Annotation of the Drosophila melanogastereuchromatic genome: a systematic review

Sima Misra; Madeline A. Crosby; Chris Mungall; Beverley B. Matthews; Kathryn S. Campbell; Pavel Hradecky; Yanmei Huang; Joshua S Kaminker; Gillian Millburn; Simon E Prochnik; Christopher D. Smith; Jonathan L Tupy; Eleanor J Whitfield; Leyla Bayraktaroglu; Benjamin P. Berman; Brian Bettencourt; Susan E. Celniker; Aubrey D.N.J. de Grey; Rachel Drysdale; Nomi L Harris; John Richter; Susan Russo; Andrew J. Schroeder; ShengQiang Shu; Mark Stapleton; Chihiro Yamada; Michael Ashburner; William M. Gelbart; Gerald M. Rubin; Suzanna E. Lewis

Nucleic Acids Research | 2005

Rapid and efficient cDNA library screening by self-ligation of inverse PCR products (SLIP)

Roger A. Hoskins; Mark Stapleton; Reed A. George; Charles Yu; Kenneth H. Wan; Joseph W. Carlson; Susan E. Celniker

cDNA cloning is a central technology in molecular biology. cDNA sequences are used to determine mRNA transcript structures, including splice junctions, open reading frames (ORFs) and 5′- and 3′-untranslated regions (UTRs). cDNA clones are valuable reagents for functional studies of genes and proteins. Expressed Sequence Tag (EST) sequencing is the method of choice for recovering cDNAs representing many of the transcripts encoded in a eukaryotic genome. However, EST sequencing samples a cDNA library at random, and it recovers transcripts with low expression levels inefficiently. We describe a PCR-based method for directed screening of plasmid cDNA libraries. We demonstrate its utility in a screen of libraries used in our Drosophila EST projects for 153 transcription factor genes that were not represented by full-length cDNA clones in our Drosophila Gene Collection. We recovered high-quality, full-length cDNAs for 72 genes and variously compromised clones for an additional 32 genes. The method can be used at any scale, from the isolation of cDNA clones for a particular gene of interest, to the improvement of large gene collections in model organisms and the human. Finally, we discuss the relative merits of directed cDNA library screening and RT–PCR approaches.

Genes, Chromosomes and Cancer | 2012

A new whole genome amplification method for studying clonal evolution patterns in malignant colorectal polyps

Daniela Hirsch; Jordi Camps; Sudhir Varma; Ralf Kemmerling; Mark Stapleton; Thomas Ried; Timo Gaiser

To identify the genetic drivers of colorectal tumorigenesis, we applied array comparative genomic hybridization (aCGH) to 13 formalin‐fixed paraffin‐embedded (FFPE) samples of early, localized human colon adenocarcinomas arising in high‐grade adenomas (so‐called “malignant polyps”). These lesions are small and hence the amount of DNA is limited. Additionally, the quality of DNA is compromised due to the fragmentation as a consequence of formalin fixation. To overcome these problems, we optimized a newly developed isothermal whole genome amplification system (NuGEN Ovation® WGA FFPE System). Starting with 100 ng of FFPE DNA, the amplification system produced 4.01 ± 0.29 μg (mean ± standard deviation) of DNA. The excellent quality of amplified DNA was further indicated by a high signal‐to‐noise ratio and a low derivative log2 ratio spread. Both, the amount of amplified DNA and aCGH performance were independent of the age of the FFPE blocks and the associated degradation of the extracted DNA. We observed losses of chromosome arms 5q and 18q in the adenoma components of the malignant polyp samples, while the embedded early carcinomas revealed losses of 8p, 17p, and 18, and gains of 7, 13, and 20. Aberrations detected in the adenoma components were invariably maintained in the embedded carcinomas. This approach demonstrates that using isothermally whole genome amplified FFPE DNA is technically suitable for aCGH. In addition to demonstrating the clonal origin of the adenoma and carcinoma part within a malignant polyp, the gain of chromosome arm 20q was an indicator for progression from adenoma to carcinoma. Published 2012 Wiley Periodicals, Inc.

Explore More