Mathäus Dejori | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mathäus Dejori is active.

Explore More

Publication

Featured researches published by Mathäus Dejori.

BMC Bioinformatics | 2010

Improving de novo sequence assembly using machine learning and comparative genomics for overlap correction

Lance Palmer; Mathäus Dejori; Randall A. Bolanos; Daniel Fasulo

BackgroundWith the rapid expansion of DNA sequencing databases, it is now feasible to identify relevant information from prior sequencing projects and completed genomes and apply it to de novo sequencing of new organisms. As an example, this paper demonstrates how such extra information can be used to improve de novo assemblies by augmenting the overlapping step. Finding all pairs of overlapping reads is a key task in many genome assemblers, and to this end, highly efficient algorithms have been developed to find alignments in large collections of sequences. It is well known that due to repeated sequences, many aligned pairs of reads nevertheless do not overlap. But no overlapping algorithm to date takes a rigorous approach to separating aligned but non-overlapping read pairs from true overlaps.ResultsWe present an approach that extends the Minimus assembler by a data driven step to classify overlaps as true or false prior to contig construction. We trained several different classification models within the Weka framework using various statistics derived from overlaps of reads available from prior sequencing projects. These statistics included percent mismatch and k-mer frequencies within the overlaps as well as a comparative genomics score derived from mapping reads to multiple reference genomes. We show that in real whole-genome sequencing data from the E. coli and S. aureus genomes, by providing a curated set of overlaps to the contigging phase of the assembler, we nearly doubled the median contig length (N50) without sacrificing coverage of the genome or increasing the number of mis-assemblies.ConclusionsMachine learning methods that use comparative and non-comparative features to classify overlaps as true or false can be used to improve the quality of a sequence assembly.

european conference on machine learning | 2007

Bayesian Substructure Learning - Approximate Learning of Very Large Network Structures

Andreas Nägele; Mathäus Dejori; Martin Stetter

In recent years, Bayesian networks became a popular framework to estimate the dependency structure of a set of variables. However, due to the NP-hardness of structure learning, this is a challenging task and typical state-of-the art algorithms fail to learn in domains with several thousands of variables. In this paper we introduce a novel algorithm, called substructure learning, that reduces the complexity of learning large networks by splitting this task into several small subtasks. Instead of learning one complete network, we estimate the network structure iteratively by learning small subnetworks. Results from several benchmark cases show that substructure learning efficiently reconstructs the network structure in large domains with high accuracy.

world congress on computational intelligence | 2008

Learning of Bayesian networks by a local discovery ant colony algorithm

Pedro Contreiras Pinto; Andreas Nägele; Mathäus Dejori; Thomas A. Runkler; João M. C. Sousa

Bayesian networks (BNs) are knowledge representation tools capable of representing dependence or independence relationships among random variables that compose a problem domain. Bayesian networks learned from data sets are receiving increasing attention within the community of researchers of uncertainty in artificial intelligence, due to their capacity to provide good inference models and to discover the structure of complex domains. One approach to learning BNs from data is to use a scoring metric to evaluate the fitness of any given candidate network for the database, and apply an optimization procedure to explore the set of candidate networks. Among the most frequently used optimization methods for this purpose is greedy search, either deterministic or stochastic. This article proposes a hybrid Bayesian network learning algorithm MMACO, based on the local discovery algorithm max-min parents and children (MMPC) and ant colony optimization (ACO). MMPC is used to construct the skeleton of the Bayesian network and then ACO is used to orientate its edges, thus returning the final structure. We apply MMACO (max-min ACO) to several sets of benchmark networks and show that it outperforms greedy search (GS) and simulated annealing (SA) algorithms.

international conference on artificial neural networks | 2007

Structure learning with nonparametric decomposable models

Anton Schwaighofer; Mathäus Dejori; Volker Tresp; Martin Stetter

We present a novel approach to structure learning for graphical models. By using nonparametric estimates to model clique densities in decomposable models, both discrete and continuous distributions can be handled in a unified framework. Also, consistency of the underlying probabilistic model is guaranteed. Model selection is based on predictive assessment, with efficient algorithms that allow fast greedy forward and backward selection within the class of decomposable models. We show the validity of this structure learning approach on toy data, and on two large sets of gene expression data.

IEEE Transactions on Evolutionary Computation | 2009