Justin Schonfeld | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Justin Schonfeld is active.

Explore More

Publication

Featured researches published by Justin Schonfeld.

BMC Ecology | 2011

When species matches are unavailable are DNA barcodes correctly assigned to higher taxa? An assessment using sphingid moths

John James Wilson; Rodolphe Rougerie; Justin Schonfeld; Daniel H. Janzen; Winnie Hallwachs; Mehrdad Hajibabaei; Ian J. Kitching; Jean Haxaire; Paul D. N. Hebert

BackgroundWhen a specimen belongs to a species not yet represented in DNA barcode reference libraries there is disagreement over the effectiveness of using sequence comparisons to assign the query accurately to a higher taxon. Library completeness and the assignment criteria used have been proposed as critical factors affecting the accuracy of such assignments but have not been thoroughly investigated. We explored the accuracy of assignments to genus, tribe and subfamily in the Sphingidae, using the almost complete global DNA barcode reference library (1095 species) available for this family. Costa Rican sphingids (118 species), a well-documented, diverse subset of the family, with each of the tribes and subfamilies represented were used as queries. We simulated libraries with different levels of completeness (10-100% of the available species), and recorded assignments (positive or ambiguous) and their accuracy (true or false) under six criteria.ResultsA liberal tree-based criterion assigned 83% of queries accurately to genus, 74% to tribe and 90% to subfamily, compared to a strict tree-based criterion, which assigned 75% of queries accurately to genus, 66% to tribe and 84% to subfamily, with a library containing 100% of available species (but excluding the species of the query). The greater number of true positives delivered by more relaxed criteria was negatively balanced by the occurrence of more false positives. This effect was most sharply observed with libraries of the lowest completeness where, for example at the genus level, 32% of assignments were false positives with the liberal criterion versus < 1% when using the strict. We observed little difference (< 8% using the liberal criterion) however, in the overall accuracy of the assignments between the lowest and highest levels of library completeness at the tribe and subfamily level.ConclusionsOur results suggest that when using a strict tree-based criterion for higher taxon assignment with DNA barcodes, the likelihood of assigning a query a genus name incorrectly is very low, if a genus name is provided it has a high likelihood of being accurate, and if no genus match is available the query can nevertheless be assigned to a subfamily with high accuracy regardless of library completeness. DNA barcoding often correctly assigned sphingid moths to higher taxa when species matches were unavailable, suggesting that barcode reference libraries can be useful for higher taxon assignments long before they achieve complete species coverage.

congress on evolutionary computation | 2005

Nonlinear projection for the display of high dimensional distance data

Daniel Ashlock; Justin Schonfeld

Display and visualization of high dimensional data are typically performed with a well-chosen linear projection of the data or by displaying many linear projections to form an animation. This study presents an evolutionary algorithm for producing nonlinear projections of high dimensional data with cues, in the drawing of the projection, as to the types of distortions introduced. Such projections can provide drawings closer to the true high dimensional distances of the displayed data than any single linear drawing. This permits a researcher to view a good analog to a scatter plot for high dimensional data. The system is demonstrated on a synthetic four dimensional fitness landscape and on distance data derived from RNA folds. Because fitness landscapes often have more dimensions than can be easily visualized it is difficult to gain an intuitive understanding of a fitness landscape. The nonlinear projection algorithm is applied to an abstraction of the fitness landscape called a fitness web. Fitness webs can be used to display the relative quality of optima, the frequency with which they were found by different evolutionary runs, or other factors of interest. In addition to displaying the relative position of optima in a fitness landscape, a graph of the fitness function along the edges a fitness web displays important slices of the fitness landscape. Called fitness morphs these plots can provide intuition about the fitness landscapes as well as direction for subsequent evolutionary searches. The second demonstration of the nonlinear projection algorithm is to data generated from an ad hoc metric on RNA folds. The algorithm yields drawings that permit a researcher to correctly distinguish two different types of folds for iron response elements.

genetic and evolutionary computation conference | 2008

Using coevolution to understand and validate game balance in continuous games

Ryan E. Leigh; Justin Schonfeld

We attack the problem of game balancing by using a coevolutionary algorithm to explore the space of possible game strategies and counter strategies. We define balanced games as games which have no single dominating strategy. Balanced games are more fun and provide a more interesting strategy space for players to explore. However, proving that a game is balanced mathematically may not be possible and industry commonly uses extensive and expensive human testing to balance games. We show how a coevolutionary algorithm can be used to test game balance and use the publicly available continuous state, capture-the-flag CaST game as our testbed. Our results show that we can use coevolution to highlight game imbalances in CaST and provide intuition towards balancing this game. This aids in eliminating dominating strategies, thus making the game more interesting as players must constantly adapt to opponent strategies.

ieee symposium on visual languages | 2000

Using the cognitive walkthrough to improve the design of a visual programming experiment

Thomas R. G. Green; Margaret M. Burnett; Andrew J. Ko; Karen J. Rothermel; Curtis R. Cook; Justin Schonfeld

Visual programming languages aim to promote usability, but are rarely examined for it. One reason is the difficulty of designing successful experimental evaluations. We propose the cognitive walkthrough as an aid to improve experimental designs. This is a novel application of an HCI-derived technique designed for evaluating interfaces rather than experiments. The technique focuses on the potential difficulties of novice users and is therefore particularly suited for evaluating the programming situation, which is knowledge-based and non-routine. We describe an empirical study performed without benefit of a walkthrough and show how the study was improved by a series of walkthroughs. We found the method to be quick to use, effective at improving the experimental design, and usable by non-specialists.

BioSystems | 2013

Sequence classification with side effect machines evolved via ring optimization.

Andrew McEachern; Daniel Ashlock; Justin Schonfeld

The explosion of available sequence data necessitates the development of sophisticated machine learning tools with which to analyze them. This study introduces a sequence-learning technology called side effect machines. It also applies a model of evolution which simulates the evolution of a ring species to the training of the side effect machines. A comparison is done between side effect machines evolved in the ring structure and side effect machines evolved using a standard evolutionary algorithm based on tournament selection. At the core of the training of side effect machines is a nearest neighbor classifier. A parameter study was performed to investigate the impact of the division of training data into examples for nearest neighbor assessment and training cases. The parameter study demonstrates that parameter setting is important in the baseline runs but had little impact in the ring-optimization runs. The ring optimization technique was also found to exhibit improved and also more reliable training performance. Side effect machines are tested on two types of synthetic data, one based on GC-content and the other checking for the ability of side effect machines to recognize an embedded motif. Three types of biological data are used, a data set with different types of immune-system genes, a data set with normal and retro-virally derived human genomic sequence, and standard and nonstandard initiation regions from the cytochrome-oxidase subunit one in the mitochondrial genome.

computational intelligence in bioinformatics and computational biology | 2006

Filtration and Depth Annotation Improve Non-linear Projection for RNA Motif Discovery

Justin Schonfeld; Daniel Ashlock

This study presents a strategy for reducing the effects of noise on the location of RNA motifs in the context of a previously developed analysis pipeline. The pipeline was developed to search for novel RNA motifs incorporating both primary and secondary structure. The ability of the pipeline to detect motifs in the presence of a relatively large amount of sequence not containing a target motif is examined in three different experiments. The first demonstrates the impact of increasing the number of sequences without a particular motif in a synthetic data set. The second experiment looks at how well a known motif, the iron response element, clusters in biological data sets with various amounts of non-IRE motif containing sequence. The final experiment applies and analyzes the effects of a number-near-neighbors filter to winnow data and highlight the presence of the clusters representing motifs. The filter is found to help substantially

computational intelligence in bioinformatics and computational biology | 2005

Depth Annotation of RNA Folds for Secondary Structure Motif Search

Daniel Ashlock; Justin Schonfeld

The biological activity of RNA depends on the way it folds into secondary structures. Presented here is a framework for exploratory motif searching in the space of RNA secondary structures. A collection of RNA sequences, suspected of having a particular biological activity, is fragmented into overlapping pieces of a uniform size. Each piece is folded and the details of the fold are used to annotate the primary structure. Distances between annotated structures are computed. The distance matrix for the structures is then projected into the Euclidean plane for visualization and detection of clusters. A motif is taken to be a cluster in the two dimensional space. An instance of the framework is implemented for testing on a data set containing examples of the Iron Response Element in the following manner. Folding is performed with the Mfold package. A depth-of-fold that records stems and loops onto the primary sequence is used to annotate the pieces of RNA. Dynamic programming is used to find distances between pieces of annotated primary sequence. An evolutionary algorithm is then used to find a one-to-one mapping of pieces of RNA to points in the plane that has acceptable distortion of the distances found with dynamic programming. This one-to-one mapping is a form of non-linear projection that optimizes for fidelity of projected distances to the distances derived from the Iron Response Element data set.

computational intelligence in bioinformatics and computational biology | 2010

Classifying Cytochrome c Oxidase subunit 1 by translation initiation mechanism using side effect machines

Justin Schonfeld; Daniel Ashlock

Cytochrome c oxidase subunit 1 (cox1) is unusual among mitochondrial genes in that instead of using AUG or one of the recognized alternative start codons it often appears to use an unknown means for initiating translation. However, the frequency of this unusual behavior as well as the underlying molecular mechanism are unknown. In this paper we use side effect machines to probe for signal in the sequence. Evolved side effect machines were able to correctly classify cox1 genes with ambiguous start codons 80.1% of the time. Side effect machines are finite state machines that have side effects associated with their states. In this study a simple side effect, a counter for the number of times the state was entered, is used. The problem is found to be challenging, a substantial majority of replicates found no signal, but some classifiers with statistically significant classification ability were located.

ieee international conference on evolutionary computation | 2006

Evaluating Distance Measures for RNA Motif Search

Justin Schonfeld; Daniel Ashlock

This paper extends an earlier study which outlined a bioinformatic pipeline for exploratory search for RNA motifs incorporating both primary and secondary structure. The pipeline is applied to three data sets, one of which is a larger version of that used in the earlier study. Instead of a single method of estimating the distance between RNA folds four distance measures were tested. The data sets are: a set of random control sequences, a set of synthetic sequences with simple designed folds, and the iron response element data set for which actual biological RNA folds are available. The pipeline demonstrates the ability to produce clusters that contain known motifs in the biological data and those designed into the synthetic data. The results for the distance measures varies substantially and one of the measures, difference in energy, is found to be too simplistic to be useful for differentiating motifs. The other three distance measures all demonstrate some degree of merit. At the heart of the pipeline is a non-linear projection algorithm that uses evolutionary computation to display the intra-RNA-fold distances so that the various distance measures can be visually compared. While the performance of this algorithm is acceptable, suggestions for improving it are made.

foundations of computational intelligence | 2014

Test problems and representations for graph evolution

Daniel Ashlock; Justin Schonfeld; Lee-Ann Barlow; Colin Lee

Graph evolution - evolving a graph or network to fit specific criteria - is a recent enterprise because of the difficulty of representing a graph in an easily evolvable form. Simple, obvious representations such as adjacency matrices can prove to be very hard to evolve and some easy-to-evolve representations place severe limits on the space of graphs that is explored. This study fills in a gap in the literature by presenting two scalable families of benchmark functions. These functions are tested on a number of representations. The first family of benchmark functions is matching the eccentricity sequences of graphs, the second is locating graphs that are relatively easy to color non-optimally. One hundred examples of the eccentricity sequence matching problem are tested. The examples have a difficulty, measured in time to solution, that varies through four orders of magnitude, demonstrating that this test problem exhibits scalability even within a particular size of problem. The ordering by problem hardness, for different representations, varies significantly from representation to representation. For the difficult coloring problem, a parameter study is presented demonstrating that the problem exhibits very different results for different algorithm parameters, demonstrating its effectiveness as a benchmark problem.

Explore More