Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Daniel Doerr is active.

Publication


Featured researches published by Daniel Doerr.


PLOS ONE | 2014

Orthology detection combining clustering and synteny for very large datasets.

Marcus Lechner; Maribel Hernandez-Rosales; Daniel Doerr; Nicolas Wieseke; Annelyse Thévenin; Jens Stoye; Roland K. Hartmann; Sonja J. Prohaska; Peter F. Stadler

The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.


BMC Bioinformatics | 2012

Gene family assignment-free comparative genomics

Daniel Doerr; Annelyse Thévenin; Jens Stoye

BackgroundThe comparison of relative gene orders between two genomes offers deep insights into functional correlations of genes and the evolutionary relationships between the corresponding organisms. Methods for gene order analyses often require prior knowledge of homologies between all genes of the genomic dataset. Since such information is hard to obtain, it is common to predict homologous groups based on sequence similarity. These hypothetical groups of homologous genes are called gene families.ResultsThis manuscript promotes a new branch of gene order studies in which prior assignment of gene families is not required. As a case study, we present a new similarity measure between pairs of genomes that is related to the breakpoint distance. We propose an exact and a heuristic algorithm for its computation. We evaluate our methods on a dataset comprising 12 γ-proteobacteria from the literature.ConclusionsIn evaluating our algorithms, we show that the exact algorithm is suitable for computations on small genomes. Moreover, the results of our heuristic are close to those of the exact algorithm. In general, we demonstrate that gene order studies can be improved by direct, gene family assignment-free comparisons.


BMC Genomics | 2014

Identifying gene clusters by discovering common intervals in indeterminate strings

Daniel Doerr; Jens Stoye; Sebastian Böcker; Katharina Jahn

BackgroundComparative analyses of chromosomal gene orders are successfully used to predict gene clusters in bacterial and fungal genomes. Present models for detecting sets of co-localized genes in chromosomal sequences require prior knowledge of gene family assignments of genes in the dataset of interest. These families are often computationally predicted on the basis of sequence similarity or higher order features of gene products. Errors introduced in this process amplify in subsequent gene order analyses and thus may deteriorate gene cluster prediction.ResultsIn this work, we present a new dynamic model and efficient computational approaches for gene cluster prediction suitable in scenarios ranging from traditional gene family-based gene cluster prediction, via multiple conflicting gene family annotations, to gene family-free analysis, in which gene clusters are predicted solely on the basis of a pairwise similarity measure of the genes of different genomes. We evaluate our gene family-free model against a gene family-based model on a dataset of 93 bacterial genomes.ConclusionsOur model is able to detect gene clusters that would be also detected with well-established gene family-based approaches. Moreover, we show that it is able to detect conserved regions which are missed by gene family-based methods due to wrong or deficient gene family assignments.


Models and Algorithms for Genome Evolution | 2013

The Potential of Family-Free Genome Comparison

Marília D. V. Braga; Cedric Chauve; Daniel Doerr; Katharina Jahn; Jens Stoye; Annelyse Thévenin; Roland Wittler

Many methods in computational comparative genomics require gene family assignments as a prerequisite. While the biological concept of gene families is well established, their computational prediction remains unreliable. This paper continues a new line of research in which family assignments are not presumed. We study the potential of several family-free approaches in detecting conserved structures, genome rearrangements and in reconstructing ancestral gene orders.


Journal of Computational Biology | 2017

New Genome Similarity Measures based on Conserved Gene Adjacencies

Daniel Doerr; Luis Antonio Brasil Kowada; Eloi Araujo; Shachi Deshpande; Simone Dantas; Bernard M. E. Moret; Jens Stoye

Many important questions in molecular biology, evolution, and biomedicine can be addressed by comparative genomic approaches. One of the basic tasks when comparing genomes is the definition of measures of similarity (or dissimilarity) between two genomes, for example, to elucidate the phylogenetic relationships between species. The power of different genome comparison methods varies with the underlying formal model of a genome. The simplest models impose the strong restriction that each genome under study must contain the same genes, each in exactly one copy. More realistic models allow several copies of a gene in a genome. One speaks of gene families, and comparative genomic methods that allow this kind of input are called gene family-based. The most powerful-but also most complex-models avoid this preprocessing of the input data and instead integrate the family assignment within the comparative analysis. Such methods are called gene family-free. In this article, we study an intermediate approach between family-based and family-free genomic similarity measures. Introducing this simpler model, called gene connections, we focus on the combinatorial aspects of gene family-free genome comparison. While in most cases, the computational costs to the general family-free case are the same, we also find an instance where the gene connections model has lower complexity. Within the gene connections model, we define three variants of genomic similarity measures that have different expression powers. We give polynomial-time algorithms for two of them, while we show NP-hardness for the third, most powerful one. We also generalize the measures and algorithms to make them more robust against recent local disruptions in gene order. Our theoretical findings are supported by experimental results, proving the applicability and performance of our newly defined similarity measures.


advances in geographic information systems | 2012

Accelerating investigation of food-borne disease outbreaks using pro-active geospatial modeling of food supply chains

Daniel Doerr; Kun Hu; Sondra R. Renly; Stefan Edlund; Matthew Davis; James H. Kaufman; Justin Lessler; Matthias Filter; A. Käsbohrer; Bernd Appel

Over the last decades the globalization of trade has significantly altered the topology of food supply chains. Even though food-borne illness has been consistently on the decline, the hazardous impact of contamination events is larger [1-3]. Possible contaminants include pathogenic bacteria, viruses, parasites, toxins or chemicals. Contamination can occur accidentally, e.g. due to improper handling, preparation, or storage, or intentionally as the melamine milk crisis proved. To identify the source of a food-borne disease it is often necessary to reconstruct the food distribution networks spanning different distribution channels or product groups. The time needed to trace back the contamination source ranges from days to weeks and significantly influences the economic and public health impact of a disease outbreak. In this paper we describe a model-based approach designed to speed up the identification of a food-borne disease outbreak source. Further, we exploit the geospatial information of wholesaler-retailer food distribution networks limited to a given food type and apply a gravity model for food distribution from retailer to consumer. We present a likelihood framework that allows determining the likelihood of wholesale source(s) distributing contaminated food based on geo-coded case reports. The developed method is independent of the underlying food distribution kernel and thus particularly applicable to empirical distributions of food acquisition.


research in computational molecular biology | 2016

New Genome Similarity Measures Based on Conserved Gene Adjacencies

Luis Antonio Brasil Kowada; Daniel Doerr; Simone Dantas; Jens Stoye

Many important questions in molecular biology, evolution and biomedicine can be addressed by comparative genomics approaches. One of the basic tasks when comparing genomes is the definition of measures of similarity (or dissimilarity) between two genomes, for example to elucidate the phylogenetic relationships between species.


workshop on algorithms in bioinformatics | 2011

Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions

Daniel Doerr; Ilan Gronau; Shlomo Moran; Irad Yavneh

Distance based phylogenetic reconstruction methods use the evolutionary distances between species in order to reconstruct the tree spanning them. This paper continues the line of research which attempts to adjust to each given set of input sequences a distance function which maximizes the expected accuracy of the reconstructed tree. We demonstrate both analytically and experimentally that by deliberately assuming an oversimplified evolutionary model, it is possible to increase the accuracy of reconstruction.


BMC Genomics | 2018

GraphTeams: a method for discovering spatial gene clusters in Hi-C sequencing data

Tizian Schulz; Jens Stoye; Daniel Doerr

BackgroundHi-C sequencing offers novel, cost-effective means to study the spatial conformation of chromosomes. We use data obtained from Hi-C experiments to provide new evidence for the existence of spatial gene clusters. These are sets of genes with associated functionality that exhibit close proximity to each other in the spatial conformation of chromosomes across several related species.ResultsWe present the first gene cluster model capable of handling spatial data. Our model generalizes a popular computational model for gene cluster prediction, called δ-teams, from sequences to graphs. Following previous lines of research, we subsequently extend our model to allow for several vertices being associated with the same label. The model, called δ-teams with families, is particular suitable for our application as it enables handling of gene duplicates. We develop algorithmic solutions for both models. We implemented the algorithm for discovering δ-teams with families and integrated it into a fully automated workflow for discovering gene clusters in Hi-C data, called GraphTeams. We applied it to human and mouse data to find intra- and interchromosomal gene cluster candidates. The results include intrachromosomal clusters that seem to exhibit a closer proximity in space than on their chromosomal DNA sequence. We further discovered interchromosomal gene clusters that contain genes from different chromosomes within the human genome, but are located on a single chromosome in mouse.ConclusionsBy identifying δ-teams with families, we provide a flexible model to discover gene cluster candidates in Hi-C data. Our analysis of Hi-C data from human and mouse reveals several known gene clusters (thus validating our approach), but also few sparsely studied or possibly unknown gene cluster candidates that could be the source of further experimental investigations.


Algorithms for Molecular Biology | 2017

The gene family-free median of three

Daniel Doerr; Metin Balaban; Pedro Feijão; Cedric Chauve

BackgroundThe gene family-free framework for comparative genomics aims at providing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We study two problems related to the breakpoint median of three genomes, which asks for the construction of a fourth genome that minimizes the sum of breakpoint distances to the input genomes.MethodsWe present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. We study its computational complexity and we describe an integer linear program (ILP) for its exact solution. We further discuss a related problem called family-free adjacencies for k genomes for the special case of

Collaboration


Dive into the Daniel Doerr's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge