Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hyrum Carroll is active.

Publication


Featured researches published by Hyrum Carroll.


Bioinformatics | 2007

DNA reference alignment benchmarks based on tertiary structure of encoded proteins

Hyrum Carroll; Wesley A. Beckstead; Timothy O'Connor; Mark T. W. Ebbert; Mark J. Clement; Quinn Snell; David A. McClellan

MOTIVATION Multiple sequence alignments (MSAs) are at the heart of bioinformatics analysis. Recently, a number of multiple protein sequence alignment benchmarks (i.e. BAliBASE, OXBench, PREFAB and SMART) have been released to evaluate new and existing MSA applications. These databases have been well received by researchers and help to quantitatively evaluate MSA programs on protein sequences. Unfortunately, analogous DNA benchmarks are not available, making evaluation of MSA programs difficult for DNA sequences. RESULTS This work presents the first known multiple DNA sequence alignment benchmarks that are (1) comprised of protein-coding portions of DNA (2) based on biological features such as the tertiary structure of encoded proteins. These reference DNA databases contain a total of 3545 alignments, comprising of 68 581 sequences. Two versions of the database are available: mdsa_100s and mdsa_all. The mdsa_100s version contains the alignments of the data sets that TBLASTN found 100% sequence identity for each sequence. The mdsa_all version includes all hits with an E-value score above the threshold of 0.001. A primary use of these databases is to benchmark the performance of MSA applications on DNA data sets. The first such case study is included in the Supplementary Material.


Bioinformatics | 2010

Threshold Average Precision (TAP-k)

Hyrum Carroll; Maricel G. Kann; Sergey L. Sheetlin; John L. Spouge

Motivation: Since database retrieval is a fundamental operation, the measurement of retrieval efficacy is critical to progress in bioinformatics. This article points out some issues with current methods of measuring retrieval efficacy and suggests some improvements. In particular, many studies have used the pooled receiver operating characteristic for n irrelevant records (ROCn) score, the area under the ROC curve (AUC) of a ‘pooled’ ROC curve, truncated at n irrelevant records. Unfortunately, the pooled ROCn score does not faithfully reflect actual usage of retrieval algorithms. Additionally, a pooled ROCn score can be very sensitive to retrieval results from as little as a single query. Methods: To replace the pooled ROCn score, we propose the Threshold Average Precision (TAP-k), a measure closely related to the well-known average precision in information retrieval, but reflecting the usage of E-values in bioinformatics. Furthermore, in addition to conditions previously given in the literature, we introduce three new criteria that an ideal measure of retrieval efficacy should satisfy. Results: PSI-BLAST, GLOBAL, HMMER and RPS-BLAST provided examples of using the TAP-k and pooled ROCn scores to evaluate sequence retrieval algorithms. In particular, compelling examples using real data highlight the drawbacks of the pooled ROCn score, showing that it can produce evaluations skewing far from intuitive expectations. In contrast, the TAP-k satisfies most of the criteria desired in an ideal measure of retrieval efficacy. Availability and Implementation: The TAP-k web server and downloadable Perl script are freely available at http://www.ncbi.nlm.nih.gov/CBBresearch/Spouge/html.ncbi/tap/ Contact: [email protected] Supplementary Information: Supplementary data are available at Bioinformatics online.


BMC Genomics | 2014

Deep sequencing of the tobacco mitochondrial transcriptome reveals expressed ORFs and numerous editing sites outside coding regions

Benjamin T Grimes; Awa K Sisay; Hyrum Carroll; A. Bruce Cahoon

BackgroundThe purpose of this study was to sequence and assemble the tobacco mitochondrial transcriptome and obtain a genomic-level view of steady-state RNA abundance. Plant mitochondrial genomes have a small number of protein coding genes with large and variably sized intergenic spaces. In the tobacco mitogenome these intergenic spaces contain numerous open reading frames (ORFs) with no clear function.ResultsThe assembled transcriptome revealed distinct monocistronic and polycistronic transcripts along with large intergenic spaces with little to no detectable RNA. Eighteen of the 117 ORFs were found to have steady-state RNA amounts above background in both deep-sequencing and qRT-PCR experiments and ten of those were found to be polysome associated. In addition, the assembled transcriptome enabled a full mitogenome screen of RNA C→U editing sites. Six hundred and thirty five potential edits were found with 557 occurring within protein-coding genes, five in tRNA genes, and 73 in non-coding regions. These sites were found in every protein-coding transcript in the tobacco mitogenome.ConclusionThese results suggest that a small number of the ORFs within the tobacco mitogenome may produce functional proteins and that RNA editing occurs in coding and non-coding regions of mitochondrial transcripts.


BMC Genomics | 2010

Analysis of long branch extraction and long branch shortening

Timothy O’Connor; Kenneth Sundberg; Hyrum Carroll; Mark J. Clement; Quinn Snell

BackgroundLong branch attraction (LBA) is a problem that afflicts both the parsimony and maximum likelihood phylogenetic analysis techniques. Research has shown that parsimony is particularly vulnerable to inferring the wrong tree in Felsenstein topologies. The long branch extraction method is a procedure to detect a data set suffering from this problem so that Maximum Likelihood could be used instead of Maximum Parsimony.ResultsThe long branch extraction method has been well cited and used by many authors in their analysis but no strong validation has been performed as to its accuracy. We performed such an analysis by an extensive search of the branch length search space under two topologies of six taxa, a Felsenstein-like topology and Farris-like topology. We also examine a long branch shortening method.ConclusionsThe long branch extraction method seems to mask the majority of the search space rendering it ineffective as a detection method of LBA. A proposed alternative, the long branch shortening method, is also ineffective in predicting long branch attraction for all tree topologies.


Bioinformatics | 2010

PathGen: a transitive gene pathway generator

Kendell Clement; Nathaniel Gustafson; Amanda Berbert; Hyrum Carroll; Christopher Merris; Ammon Olsen; Mark J. Clement; Quinn Snell; Jared Allen; Randall J. Roper

SUMMARY Many online sources of gene interaction networks supply rich visual data regarding gene pathways that can aid in the study of biological processes, disease research and drug discovery. PathGen incorporates data from several sources to create transitive connections that span multiple gene interaction databases. Results are displayed in a comprehensible graphical format, showing gene interaction type and strength, database source and microarray expression data. These features make PathGen a valuable tool for in silico discovery of novel gene interaction pathways, which can be experimentally tested and verified. The usefulness of PathGen interaction analyses was validated using genes connected to the altered facial development related to Down syndrome. AVAILABILITY http://dna.cs.byu.edu/pathgen. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. Further information is available at http://dna.cs.byu.edu/pathgen/PathGenSupplemental.pdf.


International Journal of Bioinformatics Research and Applications | 2009

An open source phylogenetic search and alignment package

Hyrum Carroll; Adam R. Teichert; Jonathan L. Krein; Kenneth Sundberg; Quinn Snell; Mark J. Clement

PSODA is a comprehensive phylogenetics package, including alignment, phylogenetic search under both parsimony and maximum likelihood, and visualisation and analysis tools. PSODA offers performance comparable to PAUP* in an open source package that aims to provide a foundation for researchers examining new phylogenetic algorithms. A key new feature is PsodaScript, an extension to the nearly ubiquitous NEXUS format, that includes conditional and loop constructs; thereby allowing complex meta-search techniques like the parsimony ratchet to be easily and compactly implemented. PSODA promises to be a valuable tool in the future development of novel phylogenetic techniques. This paper seeks to familiarise researchers with PSODA and its features, in particular the internal scripting language, PsodaScript. PSODA is freely available from the PSODA.


bioinformatics and bioengineering | 2007

Using Parsimony to Guide Maximum Likelihood Searches

Kenneth Sundberg; Timothy O'Connor; Hyrum Carroll; Mark J. Clement; Quinn Snell

The performance of maximum likelihood searches can be boosted by using the most parsimonious tree as a starting point for the search. The time spent in performing the parsimony search to find this starting tree is insignificant compared to the time spent in the maximum likelihood search, leading to an overall gain in search time. These parsimony boosted maximum likelihood searches lead to topologies with scores statistically similar to the unboosted searches, but in less time.


international conference on e science | 2014

Identification of Ancient Greek Papyrus Fragments Using Genetic Sequence Alignment Algorithms

Alex C. Williams; Hyrum Carroll; John F. Wallin; James H. Brusuelas; L. Fortson; Anne-Francoise Lamblin; Haoyu Yu

Papyrologists analyze, transcribe, and edit papyrus fragments in order to enrich modern lives by better understanding the linguistics, culture, and literature of the ancient world. One of their common tasks is to match an unknown fragment to a known manuscript. This is especially challenging when the fragments are damaged and contain only limited information (e.g., due to deterioration). In the last 100 years, only about 10% of the more than 500,000 fragments recovered from the Egyptian village of Oxyrhynchus have been edited. We do not know what new ancient texts might be found and what can be learned from them, but using current methods of identification this process will take in excess of 1000 years. The identification of an anonymous string of characters with a collection of known text sequences is ubiquitous in computational biology. Genes are often represented by a sequence of continuous characters, each of which denotes an amino acid. Relationships are inferred by finding multi-letter patterns shared between the anonymous sequence and a known sequence. This process is commonly referred to as genetic sequence alignment. In this paper, we introduce a novel methodology that uses modern genetic sequence alignment algorithms as a method for identifying Ancient Greek text fragments. This application will offer papyrologists and other professionals in the humanities the ability to rapidly identify severely damaged texts. This approach leverages a new form of non-contextual, multi-line text identification for the Greek language that can greatly accelerate the tedious task of transcription and identification.


International Journal of Computational Biology and Drug Design | 2008

Parsimony accelerated Maximum Likelihood searches

Kenneth Sundberg; Timothy O'Connor; Hyrum Carroll; Mark J. Clement; Quinn Snell

Phylogenetic search is a key tool used in a variety of biological research endeavours. However, this search problem is known to be computationally difficult, due to the astronomically large search space, making the use of heuristic methods necessary. The performance of heuristic methods for finding Maximum Likelihood (ML) trees can be improved by using parsimony as an initial estimator for ML. The time spent in performing the parsimony search to boost performance is insignificant compared to the time spent in the ML search, leading to an overall gain in search time. These parsimony boosted ML searches lead to topologies with scores statistically similar to the unboosted searches, but in less time.


bioinformatics and bioengineering | 2006

Large Grain Size Stochastic Optimization Alignment

Perry G. Ridge; Hyrum Carroll; Dan Sneddon; Mark J. Clement; Quinn Snell

DNA sequence alignment is a critical step in identifying homology between organism. The most widely used alignment program, ClustalW is known to suffer from the local minima problem, where suboptimal guide trees produce incorrect gap insertions. The optimization alignment approach, has been shown to be effective in combining alignment and phylogenetic search in order to avoid the problems associated with poor guide trees. The optimization alignment algorithm operates at a small grain size, aligning each tree found, wasting time producing multiple sequence alignments for suboptimal trees. This research develops and analyzes a large grain size algorithm for optimization alignment that iterates through steps of alignment and phylogeny search, thus improving the quality of guide trees used for computation of multiple sequence alignments and eliminating computation of multiple sequence alignments for sub-optimal guide trees. Local minima are avoided by the use of stochastic search methods. Large Grain Size Stochastic Optimization Alignment (LGA) exploits the relationship between phylogenies and multiple sequence alignments, and in so doing achieves improved alignment accuracy. LGA is licensed under the GNU General Public License. Source code and data sets are publicly available at http://csl.cs.byu.edu/lga/

Collaboration


Dive into the Hyrum Carroll's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Quinn Snell

Brigham Young University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alex C. Williams

Middle Tennessee State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

John L. Spouge

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Perry G. Ridge

University of Nebraska–Lincoln

View shared research outputs
Researchain Logo
Decentralizing Knowledge