Erik S. Wright | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Erik S. Wright is active.

Explore More

Publication

Featured researches published by Erik S. Wright.

Applied and Environmental Microbiology | 2012

DECIPHER, a Search-Based Approach to Chimera Identification for 16S rRNA Sequences

Erik S. Wright; L. Safak Yilmaz; Daniel R. Noguera

ABSTRACT DECIPHER is a new method for finding 16S rRNA chimeric sequences by the use of a search-based approach. The method is based upon detecting short fragments that are uncommon in the phylogenetic group where a query sequence is classified but frequently found in another phylogenetic group. The algorithm was calibrated for full sequences (fs_DECIPHER) and short sequences (ss_DECIPHER) and benchmarked against WigeoN (Pintail), ChimeraSlayer, and Uchime using artificially generated chimeras. Overall, ss_DECIPHER and Uchime provided the highest chimera detection for sequences 100 to 600 nucleotides long (79% and 81%, respectively), but Uchimes performance deteriorated for longer sequences, while ss_DECIPHER maintained a high detection rate (89%). Both methods had low false-positive rates (1.3% and 1.6%). The more conservative fs_DECIPHER, benchmarked only for sequences longer than 600 nucleotides, had an overall detection rate lower than that of ss_DECIPHER (75%) but higher than those of the other programs. In addition, fs_DECIPHER had the lowest false-positive rate among all the benchmarked programs (<0.20%). DECIPHER was outperformed only by ChimeraSlayer and Uchime when chimeras were formed from closely related parents (less than 10% divergence). Given the differences in the programs, it was possible to detect over 89% of all chimeras with just the combination of ss_DECIPHER and Uchime. Using fs_DECIPHER, we detected between 1% and 2% additional chimeras in the RDP, SILVA, and Greengenes databases from which chimeras had already been removed with Pintail or Bellerophon. DECIPHER was implemented in the R programming language and is directly accessible through a webpage or by downloading the program as an R package (http://DECIPHER.cee.wisc.edu).

BMC Bioinformatics | 2015

DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment

Erik S. Wright

BackgroundAlignment of large and diverse sequence sets is a common task in biological investigations, yet there remains considerable room for improvement in alignment quality. Multiple sequence alignment programs tend to reach maximal accuracy when aligning only a few sequences, and then diminish steadily as more sequences are added. This drop in accuracy can be partly attributed to a build-up of error and ambiguity as more sequences are aligned. Most high-throughput sequence alignment algorithms do not use contextual information under the assumption that sites are independent. This study examines the extent to which local sequence context can be exploited to improve the quality of large multiple sequence alignments.ResultsTwo predictors based on local sequence context were assessed: (i) single sequence secondary structure predictions, and (ii) modulation of gap costs according to the surrounding residues. The results indicate that context-based predictors have appreciable information content that can be utilized to create more accurate alignments. Furthermore, local context becomes more informative as the number of sequences increases, enabling more accurate protein alignments of large empirical benchmarks. These discoveries became the basis for DECIPHER, a new context-aware program for sequence alignment, which outperformed other programs on large sequence sets.ConclusionsPredicting secondary structure based on local sequence context is an efficient means of breaking the independence assumption in alignment. Since secondary structure is more conserved than primary sequence, it can be leveraged to improve the alignment of distantly related proteins. Moreover, secondary structure predictions increase in accuracy as more sequences are used in the prediction. This enables the scalable generation of large sequence alignments that maintain high accuracy even on diverse sequence sets. The DECIPHER R package and source code are freely available for download at DECIPHER.cee.wisc.edu and from the Bioconductor repository.

Applied and Environmental Microbiology | 2014

Automated Design of Probes for rRNA-Targeted Fluorescence In Situ Hybridization Reveals the Advantages of Using Dual Probes for Accurate Identification

Erik S. Wright; L. Safak Yilmaz; Andrew M. Corcoran; Hatice Eser Okten; Daniel R. Noguera

ABSTRACT Fluorescence in situ hybridization (FISH) is a common technique for identifying cells in their natural environment and is often used to complement next-generation sequencing approaches as an integral part of the full-cycle rRNA approach. A major challenge in FISH is the design of oligonucleotide probes with high sensitivity and specificity to their target group. The rapidly expanding number of rRNA sequences has increased awareness of the number of potential nontargets for every FISH probe, making the design of new FISH probes challenging using traditional methods. In this study, we conducted a systematic analysis of published probes that revealed that many have insufficient coverage or specificity for their intended target group. Therefore, we developed an improved thermodynamic model of FISH that can be applied at any taxonomic level, used the model to systematically design probes for all recognized genera of bacteria and archaea, and identified potential cross-hybridizations for the selected probes. This analysis resulted in high-specificity probes for 35.6% of the genera when a single probe was used in the absence of competitor probes and for 60.9% when up to two competitor probes were used. Requiring the hybridization of two independent probes for positive identification further increased specificity. In this case, we could design highly specific probe sets for up to 68.5% of the genera without the use of competitor probes and 87.7% when up to two competitor probes were used. The probes designed in this study, as well as tools for designing new probes, are available online (http://DECIPHER.cee.wisc.edu).

PLOS ONE | 2012

Modeling formamide denaturation of probe-target hybrids for improved microarray probe design in microbial diagnostics

L. Safak Yilmaz; Alexander Loy; Erik S. Wright; Michael Wagner; Daniel R. Noguera

Application of high-density microarrays to the diagnostic analysis of microbial communities is challenged by the optimization of oligonucleotide probe sensitivity and specificity, as it is generally unfeasible to experimentally test thousands of probes. This study investigated the adjustment of hybridization stringency using formamide with the idea that sensitivity and specificity can be optimized during probe design if the hybridization efficiency of oligonucleotides with target and non-target molecules can be predicted as a function of formamide concentration. Sigmoidal denaturation profiles were obtained using fluorescently labeled and fragmented 16S rRNA gene amplicon of Escherichia coli as the target with increasing concentrations of formamide in the hybridization buffer. A linear free energy model (LFEM) was developed and microarray-specific nearest neighbor rules were derived. The model simulated formamide melting with a denaturant m-value that increased hybridization free energy (ΔG°) by 0.173 kcal/mol per percent of formamide added (v/v). Using the LFEM and specific probe sets, free energy rules were systematically established to predict the stability of single and double mismatches, including bulged and tandem mismatches. The absolute error in predicting the position of experimental denaturation profiles was less than 5% formamide for more than 90 percent of probes, enabling a practical level of accuracy in probe design. The potential of the modeling approach for probe design and optimization is demonstrated using a dataset including the 16S rRNA gene of Rhodobacter sphaeroides as an additional target molecule. The LFEM and thermodynamic databases were incorporated into a computational tool (ProbeMelt) that is freely available at http://DECIPHER.cee.wisc.edu.

Applied Microbiology and Biotechnology | 2014

Mathematical tools to optimize the design of oligonucleotide probes and primers

Daniel R. Noguera; Erik S. Wright; Pamela Y. Camejo; L. Safak Yilmaz

The identification and quantification of specific organisms in mixed microbial communities often relies on the ability to design oligonucleotide probes and primers with high specificity and sensitivity. The design of these oligonucleotides (or “oligos” for short) shares many of the same principles in spite of their widely divergent applications. Three common molecular biology technologies that require oligonucleotide design are polymerase chain reaction (PCR), fluorescence in situ hybridization (FISH), and DNA microarrays. This article reviews techniques and software available for the design and optimization of oligos with the goal of targeting a specific group of organisms within mixed microbial communities. Strategies for enhancing specificity without compromising sensitivity are described, as well as design tools well suited for this purpose.

Bioinformatics | 2016

DesignSignatures: a tool for designing primers that yields amplicons with distinct signatures

Erik S. Wright; Kalin Vetsigian

UNLABELLED For numerous experimental applications, PCR primers must be designed to efficiently amplify a set of homologous DNA sequences while giving rise to amplicons with maximally diverse signatures. We developed DesignSignatures to automate the process of designing primers for high-resolution melting (HRM), fragment length polymorphism (FLP) and sequencing experiments. The program also finds the best restriction enzyme to further diversify HRM or FLP signatures. This enables efficient comparison across many experimental designs in order to maximize signature diversity. AVAILABILITY AND IMPLEMENTATION DesignSignatures is accessible as a web tool at www.DECIPHER.cee.wisc.edu, or as part of the DECIPHER open source software package for R available from BioConductor. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

bioRxiv | 2018

Stochastic exits from dormancy give rise to heavy-tailed distributions of descendants in bacterial populations

Erik S. Wright; Kalin Vetsigian

Variance in reproductive success is a major determinant of the degree of genetic drift in a population. While many plants and animals exhibit high variance in their number of progeny, far less is known about these distributions for microorganisms. Here, we used a strain barcoding approach to quantify variability in offspring number among replicate bacterial populations and developed a Bayesian method to infer the distribution of descendants from this variability. We applied our approach to measure the offspring distributions for 5 strains of bacteria from the genus Streptomyces after germination and growth in a homogenous laboratory environment. The distributions of descendants were heavy-tailed, with a few cells effectively “winning the jackpot” to become a disproportionately large fraction of the population. This extreme variability in reproductive success largely traced back to initial populations of spores stochastically exiting dormancy, which provided early-germinating spores with an exponential advantage. In simulations with multiple dormancy cycles, heavy-tailed distributions of descendants decreased the effective population size by many orders of magnitude and led to allele dynamics differing substantially from classical population genetics models with matching effective population size. Collectively, these results demonstrate that extreme variability in reproductive success can occur even in growth conditions that are far more homogeneous than the natural environment. Thus, extreme variability in reproductive success might be an important factor shaping microbial population dynamics with implications for predicting the fate of beneficial mutations, interpreting sequence variability within populations, and explaining variability in infection outcomes across patients.

Mbio | 2018

IDTAXA: a novel approach for accurate taxonomic classification of microbiome sequences

Adithya Murali; Aniruddha Bhargava; Erik S. Wright

BackgroundMicrobiome studies often involve sequencing a marker gene to identify the microorganisms in samples of interest. Sequence classification is a critical component of this process, whereby sequences are assigned to a reference taxonomy containing known sequence representatives of many microbial groups. Previous studies have shown that existing classification programs often assign sequences to reference groups even if they belong to novel taxonomic groups that are absent from the reference taxonomy. This high rate of “over classification” is particularly detrimental in microbiome studies because reference taxonomies are far from comprehensive.ResultsHere, we introduce IDTAXA, a novel approach to taxonomic classification that employs principles from machine learning to reduce over classification errors. Using multiple reference taxonomies, we demonstrate that IDTAXA has higher accuracy than popular classifiers such as BLAST, MAPSeq, QIIME, SINTAX, SPINGO, and the RDP Classifier. Similarly, IDTAXA yields far fewer over classifications on Illumina mock microbial community data when the expected taxa are absent from the training set. Furthermore, IDTAXA offers many practical advantages over other classifiers, such as maintaining low error rates across varying input sequence lengths and withholding classifications from input sequences composed of random nucleotides or repeats.ConclusionsIDTAXA’s classifications may lead to different conclusions in microbiome studies because of the substantially reduced number of taxa that are incorrectly identified through over classification. Although misclassification error is relatively minor, we believe that many remaining misclassifications are likely caused by errors in the reference taxonomy. We describe how IDTAXA is able to identify many putative mislabeling errors in reference taxonomies, enabling training sets to be automatically corrected by eliminating spurious sequences. IDTAXA is part of the DECIPHER package for the R programming language, available through the Bioconductor repository or accessible online (http://DECIPHER.codes).

Science | 2017

Getting my feet wet

Erik S. Wright

After finishing my undergrad studies, I did what many newly minted electrical engineers do and moved to Silicon Valley. Working at Apple, I was living my dream of contributing to the very products I had always admired. Yet, after a couple of years, I began to feel as though something was missing. I

Environmental Microbiology | 2014