Katharina Jahn | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Katharina Jahn is active.

Explore More

Publication

Featured researches published by Katharina Jahn.

Science | 2014

The coffee genome provides insight into the convergent evolution of caffeine biosynthesis

Lorenzo Carretero-Paulet; Alexis Dereeper; Gaëtan Droc; Romain Guyot; Marco Pietrella; Chunfang Zheng; Adriana Alberti; François Anthony; Giuseppe Aprea; Jean-Marc Aury; Pascal Bento; Maria Bernard; Stéphanie Bocs; Claudine Campa; Alberto Cenci; Marie Christine Combes; Dominique Crouzillat; Corinne Da Silva; Loretta Daddiego; Fabien De Bellis; Stéphane Dussert; Olivier Garsmeur; Thomas Gayraud; Valentin Guignon; Katharina Jahn; Véronique Jamilloux; Thierry Joët; Karine Labadie; Tianying Lan; Julie Leclercq

Coffee, tea, and chocolate converge Caffeine has evolved multiple times among plant species, but no one knows whether these events involved similar genes. Denoeud et al. sequenced the Coffea canephora (coffee) genome and identified a conserved gene order (see the Perspective by Zamir). Although this species underwent fewer genome duplications than related species, the relevant caffeine genes experienced tandem duplications that expanded their numbers within this species. Scientists have seen similar but independent expansions in distantly related species of tea and cacao, suggesting that caffeine might have played an adaptive role in coffee evolution. Science, this issue p. 1181; see also p. 1124 The genetic origins of coffee’s constituents reveal intriguing links to cacao and tea. Coffee is a valuable beverage crop due to its characteristic flavor, aroma, and the stimulating effects of caffeine. We generated a high-quality draft genome of the species Coffea canephora, which displays a conserved chromosomal gene order among asterid angiosperms. Although it shows no sign of the whole-genome triplication identified in Solanaceae species such as tomato, the genome includes several species-specific gene family expansions, among them N-methyltransferases (NMTs) involved in caffeine production, defense-related genes, and alkaloid and flavonoid enzymes involved in secondary compound synthesis. Comparative analyses of caffeine NMTs demonstrate that these genes expanded through sequential tandem duplications independently of genes from cacao and tea, suggesting that caffeine in eudicots is of polyphyletic origin.

Journal of Computational Biology | 2009

Computation of Median Gene Clusters

Sebastian Böcker; Katharina Jahn; Julia Mixtacki; Jens Stoye

Whole genome comparison based on gene order has become a popular approach in comparative genomics. An important task in this field is the detection of gene clusters, i.e., sets of genes that occur co-localized in several genomes. For most applications, it is preferable to extend this definition to allow for small deviations in the gene content of the cluster occurrences. However, relaxing the equality constraint increases the computational complexity of gene cluster detection drastically. Existing approaches deal with this problem by using simplifying constraints on the cluster definition and/or allowing only pairwise genome comparison. In this article, we introduce a cluster concept named median gene clusters that improves over existing models, present efficient algorithms for their computation and show experimental results on the detection of approximate gene clusters in multiple genomes.

research in computational molecular biology | 2010

Efficient computation of approximate gene clusters based on reference occurrences

Katharina Jahn

Whole genome comparison based on the analysis of gene cluster conservation has become a popular approach in comparative genomics. While gene order and gene content as a whole randomize over time, it is observed that certain groups of genes which are often functionally related remain co-located across species. However, the conservation is usually not perfect which turns the identification of these structures, often referred to as approximate gene clusters, into a challenging task. In this article, we present an efficient set distance based approach that computes approximate gene clusters by means of reference occurrences. We show that it yields highly comparable results to the corresponding non-reference based approach, while its polynomial runtime allows for approximate gene cluster detection in parameter ranges that used to be feasible only with simpler, e.g., max-gap based, gene cluster models. To illustrate further the performance and predictive power of our algorithm, we compare it to a state-of-the art approach for max-gap gene cluster computation.

BMC Bioinformatics | 2011

Swiftly Computing Center Strings

Franziska Hufsky; Leon Kuchenbecker; Katharina Jahn; Jens Stoye; Sebastian Böcker

BackgroundThe center string (or closest string) problem is a classic computer science problem with important applications in computational biology. Given k input strings and a distance threshold d, we search for a string within Hamming distance at most d to each input string. This problem is NP complete.ResultsIn this paper, we focus on exact methods for the problem that are also swift in application. We first introduce data reduction techniques that allow us to infer that certain instances have no solution, or that a center string must satisfy certain conditions. We describe how to use this information to speed up two previously published search tree algorithms. Then, we describe a novel iterative search strategy that is effecient in practice, where some of our reduction techniques can also be applied. Finally, we present results of an evaluation study for two different data sets from a biological application.ConclusionsWe find that the running time for computing the optimal center string is dominated by the subroutine calls for d = dopt -1 and d = dopt. Our data reduction is very effective for both, either rejecting unsolvable instances or solving trivial positions. We find that this speeds up computations considerably.

Biochimica et Biophysica Acta | 2017

Advances in understanding tumour evolution through single-cell sequencing

Jack Kuipers; Katharina Jahn; Niko Beerenwinkel

The mutational heterogeneity observed within tumours poses additional challenges to the development of effective cancer treatments. A thorough understanding of a tumours subclonal composition and its mutational history is essential to open up the design of treatments tailored to individual patients. Comparative studies on a large number of tumours permit the identification of mutational patterns which may refine forecasts of cancer progression, response to treatment and metastatic potential. The composition of tumours is shaped by evolutionary processes. Recent advances in next-generation sequencing offer the possibility to analyse the evolutionary history and accompanying heterogeneity of tumours at an unprecedented resolution, by sequencing single cells. New computational challenges arise when moving from bulk to single-cell sequencing data, leading to the development of novel modelling frameworks. In this review, we present the state of the art methods for understanding the phylogeny encoded in bulk or single-cell sequencing data, and highlight future directions for developing more comprehensive and informative pictures of tumour evolution. This article is part of a Special Issue entitled: Evolutionary principles - heterogeneity in cancer?, edited by Dr. Robert A. Gatenby.

BMC Bioinformatics | 2013

Statistics for approximate gene clusters

Katharina Jahn; Sascha Winter; Jens Stoye; Sebastian Böcker

BackgroundGenes occurring co-localized in multiple genomes can be strong indicators for either functional constraints on the genome organization or remnant ancestral gene order. The computational detection of these patterns, which are usually referred to as gene clusters, has become increasingly sensitive over the past decade. The most powerful approaches allow for various types of imperfect cluster conservation: Cluster locations may be internally rearranged. The individual cluster locations may contain only a subset of the cluster genes and may be disrupted by uninvolved genes. Moreover cluster locations may not at all occur in some or even most of the studied genomes. The detection of such low quality clusters increases the risk of mistaking faint patterns that occur merely by chance for genuine findings. Therefore, it is crucial to estimate the significance of computational gene cluster predictions and discriminate between true conservation and coincidental clustering.ResultsIn this paper, we present an efficient and accurate approach to estimate the significance of gene cluster predictions under the approximate common intervals model. Given a single gene cluster prediction, we calculate the probability to observe it with the same or a higher degree of conservation under the null hypothesis of random gene order, and add a correction factor to account for multiple testing. Our approach considers all parameters that define the quality of gene cluster conservation: the number of genomes in which the cluster occurs, the number of involved genes, the degree of conservation in the different genomes, as well as the frequency of the clustered genes within each genome. We apply our approach to evaluate gene cluster predictions in a large set of well annotated genomes.

workshop on algorithms in bioinformatics | 2010

Swiftly computing center strings

Franziska Hufsky; Leon Kuchenbecker; Katharina Jahn; Jens Stoye; Sebastian Böcker

The center string (or closest string) problem is a classical computer science problem with important applications in computational biology. Given k input strings and a distance threshold d, we search for a string within Hamming distance d to each input string. This problem is NP-complete. In this paper, we focus on exact methods for the problem that are also fast in application. First, we introduce data reduction techniques that allow us to infer that certain instances have no solution, or that a center string must satisfy certain conditions. Then, we describe a novel search tree strategy that is very efficient in practice. Finally, we present results of an evaluation study for instances from a biological application. We find that data reduction is mandatory for the notoriously difficult case d = dopt - 1.

BMC Genomics | 2014

Identifying gene clusters by discovering common intervals in indeterminate strings

Daniel Doerr; Jens Stoye; Sebastian Böcker; Katharina Jahn

BackgroundComparative analyses of chromosomal gene orders are successfully used to predict gene clusters in bacterial and fungal genomes. Present models for detecting sets of co-localized genes in chromosomal sequences require prior knowledge of gene family assignments of genes in the dataset of interest. These families are often computationally predicted on the basis of sequence similarity or higher order features of gene products. Errors introduced in this process amplify in subsequent gene order analyses and thus may deteriorate gene cluster prediction.ResultsIn this work, we present a new dynamic model and efficient computational approaches for gene cluster prediction suitable in scenarios ranging from traditional gene family-based gene cluster prediction, via multiple conflicting gene family annotations, to gene family-free analysis, in which gene clusters are predicted solely on the basis of a pairwise similarity measure of the genes of different genomes. We evaluate our gene family-free model against a gene family-based model on a dataset of 93 bacterial genomes.ConclusionsOur model is able to detect gene clusters that would be also detected with well-established gene family-based approaches. Moreover, we show that it is able to detect conserved regions which are missed by gene family-based methods due to wrong or deficient gene family assignments.

Models and Algorithms for Genome Evolution | 2013

The Potential of Family-Free Genome Comparison

Marília D. V. Braga; Cedric Chauve; Daniel Doerr; Katharina Jahn; Jens Stoye; Annelyse Thévenin; Roland Wittler

Many methods in computational comparative genomics require gene family assignments as a prerequisite. While the biological concept of gene families is well established, their computational prediction remains unreliable. This paper continues a new line of research in which family assignments are not presumed. We study the potential of several family-free approaches in detecting conserved structures, genome rearrangements and in reconstructing ancestral gene orders.

research in computational molecular biology | 2008

Computation of median gene clusters

Sebastian Böcker; Katharina Jahn; Julia Mixtacki; Jens Stoye

Whole genome comparison based on gene order has become a popular approach in comparative genomics. An important task in this field is the detection of gene clusters, i.e. sets of genes that occur colocalized in several genomes. For most applications it is preferable to extend this definition to allow for small deviations in the gene content of the cluster occurrences. However, relaxing the equality constraint increases the computational complexity of gene cluster detection drastically. Existing approaches deal with this problem by using simplifying constraints on the cluster definition and/or allowing only pairwise genome comparison. In this paper we introduce a cluster concept named median gene clusters that improves over existing models and present efficient algorithms for their computation that allow for the detection of approximate gene clusters in multiple genomes.

Explore More