Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Daniel Quest is active.

Publication


Featured researches published by Daniel Quest.


PLOS ONE | 2012

The Fast Changing Landscape of Sequencing Technologies and Their Impact on Microbial Genome Assemblies and Annotation

Konstantinos Mavromatis; Miriam Land; Thomas Brettin; Daniel Quest; Alex Copeland; Alicia Clum; Lynne Goodwin; Tanja Woyke; Alla Lapidus; Hans-Peter Klenk; Robert W. Cottingham; Nikos C. Kyrpides

Background The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation. Methodology/Principal Findings In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis. Conclusion These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).


BMC Bioinformatics | 2011

Scenario driven data modelling: a method for integrating diverse sources of data and data streams

Thomas Brettin; Robert W. Cottingham; Shelton D. Griffith; Daniel Quest

BackgroundBiology is rapidly becoming a data intensive, data-driven science. It is essential that data is represented and connected in ways that best represent its full conceptual content and allows both automated integration and data driven decision-making. Recent advancements in distributed multi-relational directed graphs, implemented in the form of the Semantic Web make it possible to deal with complicated heterogeneous data in new and interesting ways.ResultsThis paper presents a new approach, scenario driven data modelling (SDDM), that integrates multi-relational directed graphs with data streams. SDDM can be applied to virtually any data integration challenge with widely divergent types of data and data streams. In this work, we explored integrating genetics data with reports from traditional media. SDDM was applied to the New Delhi metallo-beta-lactamase gene (NDM-1), an emerging global health threat. The SDDM process constructed a scenario, created a RDF multi-relational directed graph that linked diverse types of data to the Semantic Web, implemented RDF conversion tools (RDFizers) to bring content into the Sematic Web, identified data streams and analytical routines to analyse those streams, and identified user requirements and graph traversals to meet end-user requirements.ConclusionsWe provided an example where SDDM was applied to a complex data integration challenge. The process created a model of the emerging NDM-1 health threat, identified and filled gaps in that model, and constructed reliable software that monitored data streams based on the scenario derived multi-relational directed graph. The SDDM process significantly reduced the software requirements phase by letting the scenario and resulting multi-relational directed graph define what is possible and then set the scope of the user requirements. Approaches like SDDM will be critical to the future of data intensive, data-driven science because they automate the process of converting massive data streams into usable knowledge.


BMC Bioinformatics | 2005

A method of precise mRNA/DNA homology-based gene structure prediction.

Alexander G. Churbanov; Mark A. Pauley; Daniel Quest; Hesham H. Ali

BackgroundAccurate and automatic gene finding and structural prediction is a common problem in bioinformatics, and applications need to be capable of handling non-canonical splice sites, micro-exons and partial gene structure predictions that span across several genomic clones.ResultsWe present a mRNA/DNA homology based gene structure prediction tool, GIGOgene. We use a new affine gap penalty splice-enhanced global alignment algorithm running in linear memory for a high quality annotation of splice sites. Our tool includes a novel algorithm to assemble partial gene structure predictions using interval graphs. GIGOgene exhibited a sensitivity of 99.08% and a specificity of 99.98% on the Genie learning set, and demonstrated a higher quality of gene structural prediction when compared to Sim4, est2genome, Spidey, Galahad and BLAT, including when genes contained micro-exons and non-canonical splice sites. GIGOgene showed an acceptable loss of prediction quality when confronted with a noisy Genie learning set simulating ESTs.ConclusionGIGOgene shows a higher quality of gene structure prediction for mRNA/DNA spliced alignment when compared to other available tools.


computational systems bioinformatics | 2003

A new approach for gene annotation using unambiguous sequence joining

Alexandre Tchourbanov; Daniel Quest; Hesham H. Ali; Mark A. Pauley; Robert B. Norgren

The problem addressed by this paper is accurate and automatic gene annotation following precise identification/annotation of exon and intron boundaries of biologically verified nucleotide sequences using the alignment of human genomic DNA to curated mRNA transcripts. We provide a detailed description of a new cDNA/DNA homology gene annotation algorithm that combines the results of BLASTN searches and spliced alignments. Compared to other programs currently in use, annotation quality is significantly increased through the unambiguous junction of genomic DNA sequences. We also address gene annotation with both noncanonic splice sites and short exons. The approach has been tested on the genie learning subset as well as full-scale human RefSeq, and has demonstrated performance as high as 97%.


BMC Bioinformatics | 2010

Next generation models for storage and representation of microbial biological annotation

Daniel Quest; Miriam Land; Thomas Brettin; Robert W. Cottingham

BackgroundTraditional genome annotation systems were developed in a very different computing era, one where the World Wide Web was just emerging. Consequently, these systems are built as centralized black boxes focused on generating high quality annotation submissions to GenBank/EMBL supported by expert manual curation. The exponential growth of sequence data drives a growing need for increasingly higher quality and automatically generated annotation.Typical annotation pipelines utilize traditional database technologies, clustered computing resources, Perl, C, and UNIX file systems to process raw sequence data, identify genes, and predict and categorize gene function. These technologies tightly couple the annotation software system to hardware and third party software (e.g. relational database systems and schemas). This makes annotation systems hard to reproduce, inflexible to modification over time, difficult to assess, difficult to partition across multiple geographic sites, and difficult to understand for those who are not domain experts. These systems are not readily open to scrutiny and therefore not scientifically tractable.The advent of Semantic Web standards such as Resource Description Framework (RDF) and OWL Web Ontology Language (OWL) enables us to construct systems that address these challenges in a new comprehensive way.ResultsHere, we develop a framework for linking traditional data to OWL-based ontologies in genome annotation. We show how data standards can decouple hardware and third party software tools from annotation pipelines, thereby making annotation pipelines easier to reproduce and assess. An illustrative example shows how TURTLE (Terse RDF Triple Language) can be used as a human readable, but also semantically-aware, equivalent to GenBank/EMBL files.ConclusionsThe power of this approach lies in its ability to assemble annotation data from multiple databases across multiple locations into a representation that is understandable to researchers. In this way, all researchers, experimental and computational, will more easily understand the informatics processes constructing genome annotation and ultimately be able to help improve the systems that produce them.


Methods of Molecular Biology | 2010

The Motif Tool Assessment Platform (MTAP) for Sequence-Based Transcription Factor Binding Site Prediction Tools

Daniel Quest; Hesham H. Ali

Predicting transcription factor binding sites (TFBS) from sequence is one of the most challenging problems in computational biology. The development of (semi-)automated computer-assisted prediction methods is needed to find TFBS over an entire genome, which is a first step in reconstructing mechanisms that control gene activity. Bioinformatics journals continue to publish diverse methods for predicting TFBS on a monthly basis. To help practitioners in deciding which method to use to predict for a particular TFBS, we provide a platform to assess the quality and applicability of the available methods. Assessment tools allow researchers to determine how methods can be expected to perform on specific organisms or on specific transcription factor families. This chapter introduces the TFBS detection problem and reviews current strategies for evaluating algorithm effectiveness. In this chapter, a novel and robust assessment tool, the Motif Tool Assessment Platform (MTAP), is introduced and discussed.


international parallel and distributed processing symposium | 2008

A parallel architecture for regulatory motif algorithm assessment

Daniel Quest; Kathryn Dempsey; M. Shaflullah; Dhundy Kiran Bastola; Hesham H. Ali

Computational discovery of cis-regulatory motifs has become one of the more challenging problems in bioinformatics. In recent years, over 150 methods have been proposed as solutions, however, it remains difficult to characterize the advantages and disadvantages of these approaches because of the wide variability of approaches and datasets. Although biologists desire a set of parameters and a program most appropriate for cis-regulatory discovery in their domain of interest, compiling such a list is a great computational challenge. First, a discovery pipeline for 150+ methods must be automated and then each dataset of interest must used to grade the methods. Automation is challenging because these programs are intended to be used over a small set of sites and consequently have many manual steps intended to help the user in fine-tuning the program to specific problems or organisms. If a program is fine-tuned to parameters other than those used in the original paper, it is not guaranteed to have the same sensitivity and specificity. Consequently, there are few methods that rank motif discovery tools. This paper proposes a parallel framework for the automation and evaluation of cis-regulatory motif discovery tools. This evaluation platform can both run and benchmark motif discovery tools over a wide range of parameters and is the first method to consider both multiple binding locations within a regulatory region and regulatory regions of orthologous genes. Because of the large amount of tests required, we implemented this platform on a computing cluster to increase performance.


hawaii international conference on system sciences | 2005

A Grammar Based Approach for Mining Bioinformatics Databases

Daniel Quest; Hesham H. Ali

In this paper we introduce a new formal approach for mining biological data sets. The proposed grammar based approach provides a flexible and powerful tool for advanced sequence comparison and data mining. The approach benefits from the power of regular expressions in allowing the use of advanced queries in comparing sequences and searching for motifs or sequence attributes in biological databases. The formal grammar and the corresponding data mining engine is capable of extracting records from biological databases, filtering a subset of those records for mining, and then sorting those records based on similarity scheme designed by the user. This model is based on the objective (ontology) of the user and scoring is dynamic that is provided at runtime.In this paper we introduce a new formal approach for mining biological data sets. The proposed grammar based approach provides a flexible and powerful tool for advanced sequence comparison and data mining. The approach benefits from the power of regular grammars in allowing the use of advanced queries in comparing sequences and searching for motifs or interior-sequence attributes in biological databases. The formal grammar and the corresponding data mining engine is capable of extracting records from biological databases, filtering a subset of those records for mining, and then sorting those records based on similarity scheme designed by the user. This model is based on the objective (ontology) of the user and scoring is dynamic and is provided at runtime.


computational systems bioinformatics | 2004

Ontology specific data mining based on dynamic grammars

Daniel Quest; Hesham H. Ali


Archive | 2012

SCREENING TOOL FOR PROVIDERS OF SYNTHETIC DOUBLE STRANDED DNA

Thomas Brettin; Robert W. Cottingham; Daniel Quest

Collaboration


Dive into the Daniel Quest's collaboration.

Top Co-Authors

Avatar

Hesham H. Ali

University of Nebraska Omaha

View shared research outputs
Top Co-Authors

Avatar

Thomas Brettin

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Robert W. Cottingham

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Mark A. Pauley

University of Nebraska Omaha

View shared research outputs
Top Co-Authors

Avatar

Miriam Land

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alicia Clum

Joint Genome Institute

View shared research outputs
Top Co-Authors

Avatar

Dhundy Kiran Bastola

University of Nebraska Omaha

View shared research outputs
Top Co-Authors

Avatar

Kathryn Dempsey

University of Nebraska Omaha

View shared research outputs
Researchain Logo
Decentralizing Knowledge