Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Todd Smith is active.

Publication


Featured researches published by Todd Smith.


Nature Biotechnology | 2014

Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study.

Sheng Li; Scott Tighe; Charles M. Nicolet; Deborah S. Grove; Shawn Levy; William G. Farmerie; Agnes Viale; Chris L. Wright; Peter A. Schweitzer; Yuan Gao; Dewey Kim; Joe Boland; Belynda Hicks; Ryan Kim; Sagar Chhangawala; Nadereh Jafari; Nalini Raghavachari; Jorge Gandara; Natàlia Garcia-Reyero; Cynthia Hendrickson; David Roberson; Jeffrey Rosenfeld; Todd Smith; Jason G. Underwood; May Wang; Paul Zumbo; Don Baldwin; George Grills; Christopher E. Mason

High-throughput RNA sequencing (RNA-seq) greatly expands the potential for genomics discoveries, but the wide variety of platforms, protocols and performance capabilitites has created the need for comprehensive reference data. Here we describe the Association of Biomolecular Resource Facilities next-generation sequencing (ABRF-NGS) study on RNA-seq. We carried out replicate experiments across 15 laboratory sites using reference RNA standards to test four protocols (poly-A–selected, ribo-depleted, size-selected and degraded) on five sequencing platforms (Illumina HiSeq, Life Technologies PGM and Proton, Pacific Biosciences RS and Roche 454). The results show high intraplatform (Spearman rank R > 0.86) and inter-platform (R > 0.83) concordance for expression measures across the deep-count platforms, but highly variable efficiency and cost for splice junction and variant detection between all platforms. For intact RNA, gene expression profiles from rRNA-depletion and poly-A enrichment are similar. In addition, rRNA depletion enables effective analysis of degraded RNA samples. This study provides a broad foundation for cross-platform standardization, evaluation and improvement of RNA-seq.


PLOS ONE | 2012

Limitations of the Human Reference Genome for Personalized Genomics

Jeffrey A. Rosenfeld; Christopher E. Mason; Todd Smith

Data from the 1000 genomes project (1KGP) and Complete Genomics (CG) have dramatically increased the numbers of known genetic variants and challenge several assumptions about the reference genome and its uses in both clinical and research settings. Specifically, 34% of published array-based GWAS studies for a variety of diseases utilize probes that overlap unanticipated single nucleotide polymorphisms (SNPs), indels, or structural variants. Linkage disequilibrium (LD) block length depends on the numbers of markers used, and the mean LD block size decreases from 16 kb to 7 kb,when HapMap-based calculations are compared to blocks computed from1KGP data. Additionally, when 1KGP and CG variants are compared, 19% of the single nucleotide variants (SNVs) reported from common genomes are unique to one dataset; likely a result of differences in data collection methodology, alignment of reads to the reference genome, and variant-calling algorithms. Together these observations indicate that current research resources and informatics methods do not adequately account for the high level of variation that already exists in the human population and significant efforts are needed to create resources that can accurately assess personal genomics for health, disease, and predict treatment outcomes.


Mayo Clinic Proceedings | 2012

Transcriptional Profiling by Sequencing of Oropharyngeal Cancer

Rebecca R. Laborde; Vivian W. Wang; Todd Smith; N. Eric Olson; Steven M. Olsen; Joaquin J. Garcia; Kerry D. Olsen; Eric J. Moore; Jan L. Kasperbauer; Nicole M. Tombers; David I. Smith

OBJECTIVE To compare full transcriptome expression levels of matched tumor and normal samples from patients with oropharyngeal carcinoma stratified by known tumor etiologic factors. PATIENTS AND METHODS Full transcriptome sequencing was analyzed for 10 matched tumor and normal tissue samples from patients with previously untreated oropharyngeal carcinoma. Transcriptomes were analyzed using massively parallel messenger RNA sequencing and validated using the NanoString nCounter system. Global gene expression levels were compared in samples grouped by smoking status and human papillomavirus status. This study was completed between June 10, 2010, and June 30, 2011. RESULTS Global gene expression analysis indicated tumor tissue from former smokers grouped more closely to the never smokers than the current smokers. Pathway analysis revealed alterations in the expression of genes involved in the p53 DNA damage-repair pathway, including CHEK2 and ATR, which display patterns of increased expression that is associated with human papillomavirus-negative current smokers rather than former or never smokers. CONCLUSION These findings support the application of messenger RNA sequencing technology as an important clinical tool for more accurately stratifying patients based on individual tumor biology with the goal of improving our understanding of tumor prognosis and treatment response, ultimately leading to individualized patient care strategies.


Advances in Experimental Medicine and Biology | 2010

Standardizing the Next Generation of Bioinformatics Software Development with BioHDF (HDF5)

Christopher E. Mason; Paul Zumbo; Stephan J. Sanders; Mike Folk; Dana Robinson; Ruth Aydt; Martin Gollery; Mark Welsh; N. Eric Olson; Todd Smith

Next Generation Sequencing technologies are limited by the lack of standard bioinformatics infrastructures that can reduce data storage, increase data processing performance, and integrate diverse information. HDF technologies address these requirements and have a long history of use in data-intensive science communities. They include general data file formats, libraries, and tools for working with the data. Compared to emerging standards, such as the SAM/BAM formats, HDF5-based systems demonstrate significantly better scalability, can support multiple indexes, store multiple data types, and are self-describing. For these reasons, HDF5 and its BioHDF extension are well suited for implementing data models to support the next generation of bioinformatics applications.


Current protocols in human genetics | 2009

Analyzing Gene Expression Data from Microarray and Next‐Generation DNA Sequencing Transcriptome Profiling Assays Using GeneSifter Analysis Edition

Sandra Porter; N. Eric Olson; Todd Smith

Transcription profiling with microarrays has become a standard procedure for comparing the levels of gene expression between pairs of samples, or multiple samples following different experimental treatments. New technologies, collectively known as next‐generation DNA sequencing methods, are also starting to be used for transcriptome analysis. These technologies, with their low background, large capacity for data collection, and dynamic range, provide a powerful and complementary tool to the assays that formerly relied on microarrays. In this chapter, we describe two protocols for working with microarray data from pairs of samples and samples treated with multiple conditions, and discuss alternative protocols for carrying out similar analyses with next‐generation DNA sequencing data from two different instrument platforms (Illumina GA and Applied Biosystems SOLiD). Curr. Protoc. Bioinform. 27:7.14.1‐7.14.35.


Genome Biology | 2010

Making cancer transcriptome sequencing assays practical for the research and clinical scientist

Todd Smith; N. Eric Olson; David I. Smith

Next generation DNA sequencing (NGS) technologies are increasing in their appeal for studying cancer genomics. High-throughput data and a growing repertoire of applications that quantitatively measure gene expression, splicing, noncoding RNAs, and genomic variation are revealing that cancer is a more complex and heterogeneous disease than previously imagined. Fully characterizing the ~10,000 types and subtypes of cancer that exist to develop biomarkers that can be used to clinically define tumors and target specific treatments requires large studies that examine specific tumors in thousands of patients. This goal will fail without significantly reducing both data production and analysis costs, so that most cancer biologists and clinicians can conduct NGS assays and analyze their data in routine ways. Currently, most cancer biology NGS papers are published either by genome centers or through collaborations with instrument vendors. However, this is going to change rapidly with efforts like the Cancer Genome Anatomy project. In any case, large teams of bioinformaticians are involved in analyzing data through labor-intensive processes. With refinements offered by the Illumina HiSeq 2000, or Life Technologies SOLiD 4, the cost of collecting data for transcriptome analysis and mate-pair genome sequencing is sufficiently inexpensive for small groups and individuals, beyond genome centers, to conduct the required studies. However, current data analysis methods need to be automated with established tools in scalable and adaptable systems that provide standard reports to make results available to enable interactive exploration by biologists and clinicians. In our presentation, we will examine the time and costs required to analyze data that will be collected in future cancer studies. Using data from existing matched tumor and normal transcriptome studies from random oral cancer samples, and samples grouped by drinking and smoking behavior (as a tool to define data analysis requirements), we will compare the costs of conducting large studies using current data analysis approaches with those using integrated software systems to demonstrate how automation reduces costs, while providing comparable results for identifying transcript isoforms, mutations and novel translocations. Geospizas GeneSifter distributed cloud- based software architecture, including open source tools, like BioHDF, will be described to share insights into high performance computing requirements for scalable data processing.


Pigment Cell & Melanoma Research | 2009

GeneSifter, not so blind after all.

N. Eric Olson; Jeff Kozlowski; Sandra Porter; Todd Smith

Dear Sir, We are writing this letter in response to ‘GeneSifter, leading the blind,’ a letter published last fall (Hoek, 2008) about GeneSifter , a software product for analyzing data from microarray and next generation DNA sequencing experiments (http://www.geospiza.com). We wish to address the areas of concern described in the article: the methods used for data normalization and concerns regarding how fold-change cut-offs and statistical tests are applied. Hoek raises valid points regarding both the need for between chip normalization and that the order of filtering steps can have dramatic affects on the control for false positives; however, we maintain that Hoek’s criticisms were based on an incomplete knowledge of GeneSifter features, capabilities and proper operation. We believe this misunderstanding most likely resulted from a failure on the part of VizXLabs (the previous owners of GeneSifter) to adequately respond to Hoek’s concerns. As the new owners of GeneSifter, we would like to address the criticisms that Hoek raised. Hoek pointed out the necessity for normalizing microarray measurements both within a chip and between multiple chips and wrote that GeneSifter fails to account for the differences between chips when normalizing data (Hoek, 2008). It’s probable that Hoek missed seeing GeneSifter’s options for normalization because they are not visible from the pairwise analysis interface. GeneSifter does include methods for normalizing data within chips and between chips, but also gives users a choice to upload normalized data without further processing; this option allows users to circumvent errors that would result from re-processing already normalized data. The normalization options within GeneSifter include methods such as RMA (Robust Microarray Analysis) and GC-RMA (Irizarry et al., 2003; Wu et al., 2004) for normalizing data between multiple Affymetrix chips. It should be noted, however, that these methods must be applied at the time of loading the data rather than at the time of analysis. Consequently, these methods are not available through the pairwise analysis interface and could be missed by someone who is new to the software. We believe these details were not communicated to Hoek, leaving him with the impression that these more robust normalization methods are missing from the GeneSifter package. Hoek’s second concern was with the order in which a fold-change cut-off is applied and the affects on corrections for multiple testing. Hoek stated that the order of statistical operations in GeneSifter was incorrect because users could apply a fold-change filter to averaged data before performing statistical tests and that users were left unable to change the options or perform steps in the appropriate manner, resulting in weak control for false positives. While it is true that this ordering can be used in GeneSifter, the product has always allowed the statistical test and corrections to be performed prior to applying a fold-change cut-off. As with the normalization concerns, we believe this misunderstanding arose from a failure to communicate the options available in GeneSifter. The default settings for pairwise statistics did perform a threshold cut-off first and then apply a correction based on the number of genes that passed the initial cut-off. These settings were used to facilitate discovery by decreasing the possibility of false negatives. In contrast to the statement in the article; however, GeneSifter users have always had the option to change the order of steps by using the preference settings in the GeneSifter program. The last point we wish to address is the comparison between data analyzed with GeneSifter and the same data analyzed with GeneSpring (Agilent Technologies). Hoek analyzed a publicly available colon cancer data set (GEO accession no. GDS756) with both GeneSifter and GeneSpring. The GeneSpring analysis yielded 449 genes with significant differences in expression where analyzing the data with GeneSifter produced a list of 1556 genes. This difference however, did not result from a true difference in the software platforms, the difference in gene number resulted because different methods were used for the analyses. When we use the same analysis procedure with GeneSifter, that Hoek used with GeneSpring, we obtain similar numbers of significantly expressed genes. Since the publication of Hoek’s letter; we have made changes in the default settings in the pairwise analysis interface to reduce the likelihood that users would inadvertently lower their control for false positives. Users still have the option however, to change their preferences settings and reverse the order should the need arise.


PLOS ONE | 2018

Bioinformatics core competencies for undergraduate life sciences education

Melissa A. Wilson Sayres; Charles Hauser; Michael L. Sierk; Srebrenka Robic; Anne G. Rosenwald; Todd Smith; Eric W. Triplett; Jason Williams; Elizabeth A. Dinsdale; William Morgan; James M. Burnette; Samuel S. Donovan; Jennifer C. Drew; Sarah C. R. Elgin; Edison Fowlks; Sebastian Galindo-Gonzalez; Anya Goodman; Nealy F. Grandgenett; Carlos C. Goller; John R. Jungck; Jeffrey D. Newman; William R. Pearson; Elizabeth F. Ryder; Rafael Tosado-Acevedo; William E. Tapprich; Tammy Tobin; Arlín Toro-Martínez; Lonnie R. Welch; Robin Wright; Lindsay Barone

Although bioinformatics is becoming increasingly central to research in the life sciences, bioinformatics skills and knowledge are not well integrated into undergraduate biology education. This curricular gap prevents biology students from harnessing the full potential of their education, limiting their career opportunities and slowing research innovation. To advance the integration of bioinformatics into life sciences education, a framework of core bioinformatics competencies is needed. To that end, we here report the results of a survey of biology faculty in the United States about teaching bioinformatics to undergraduate life scientists. Responses were received from 1,260 faculty representing institutions in all fifty states with a combined capacity to educate hundreds of thousands of students every year. Results indicate strong, widespread agreement that bioinformatics knowledge and skills are critical for undergraduate life scientists as well as considerable agreement about which skills are necessary. Perceptions of the importance of some skills varied with the respondent’s degree of training, time since degree earned, and/or the Carnegie Classification of the respondent’s institution. To assess which skills are currently being taught, we analyzed syllabi of courses with bioinformatics content submitted by survey respondents. Finally, we used the survey results, the analysis of the syllabi, and our collective research and teaching expertise to develop a set of bioinformatics core competencies for undergraduate biology students. These core competencies are intended to serve as a guide for institutions as they work to integrate bioinformatics into their life sciences curricula.


Blood | 2007

A sequence variation scan of the coagulation factor VIII (FVIII) structural gene and associations with plasma FVIII activity levels

Kevin R. Viel; Deepa K. Machiah; Diane Warren; Manana Khachidze; Alfonso Buil; Karl Fernstrom; Juan Carlos Souto; Juan Manuel Peralta; Todd Smith; John Blangero; Sandra Porter; Stephen T. Warren; Jordi Fontcuberta; José Manuel Soria; W. Dana Flanders; Laura Almasy; Tom E. Howard


Advances in Experimental Medicine and Biology | 2014

Characterizing Multi-omic Data in Systems Biology

Christopher E. Mason; Sandra Porter; Todd Smith

Collaboration


Dive into the Todd Smith's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

N. Eric Olson

University of Washington

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

N. Eric Olson

University of Washington

View shared research outputs
Top Co-Authors

Avatar

Agnes Viale

Memorial Sloan Kettering Cancer Center

View shared research outputs
Top Co-Authors

Avatar

Charles M. Nicolet

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David Roberson

Science Applications International Corporation

View shared research outputs
Top Co-Authors

Avatar

Deborah S. Grove

Pennsylvania State University

View shared research outputs
Researchain Logo
Decentralizing Knowledge