Stefan Kurtz | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stefan Kurtz is active.

Explore More

Publication

Featured researches published by Stefan Kurtz.

Genome Biology | 2004

Versatile and open software for comparing large genomes

Stefan Kurtz; Adam M. Phillippy; Arthur L. Delcher; Michael Smoot; Martin Shumway; Corina Antonescu

The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes. Two new graphical viewing tools provide alternative ways to analyze genome alignments. The new system is the first version of MUMmer to be released as open-source software. This allows other developers to contribute to the code base and freely redistribute the code. The MUMmer sources are available at http://www.tigr.org/software/mummer.

Journal of Discrete Algorithms | 2004

Replacing suffix trees with enhanced suffix arrays

Mohamed Abouelhoda; Stefan Kurtz; Enno Ohlebusch

The suffix tree is one of the most important data structures in string processing and comparative genomics. However, the space consumption of the suffix tree is a bottleneck in large scale applications such as genome analysis. In this article, we will overcome-this obstacle. We will show how every algorithm that uses a suffix tree as data structure can systematically be replaced with an algorithm that uses an enhanced suffix array and solves the same problem in the same time complexity. The generic name enhanced suffix array stands for data structures consisting of the suffix array and additional tables. Our new algorithms are not only more space efficient than previous ones, but they are also faster and easier to implement.

PLOS Computational Biology | 2009

Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures

Steve Hoffmann; Christian Otto; Stefan Kurtz; Cynthia M. Sharma; Philipp Khaitovich; Jörg Vogel; Peter F. Stadler; Jörg Hackermüller

With few exceptions, current methods for short read mapping make use of simple seed heuristics to speed up the search. Most of the underlying matching models neglect the necessity to allow not only mismatches, but also insertions and deletions. Current evaluations indicate, however, that very different error models apply to the novel high-throughput sequencing methods. While the most frequent error-type in Illumina reads are mismatches, reads produced by 454s GS FLX predominantly contain insertions and deletions (indels). Even though 454 sequencers are able to produce longer reads, the method is frequently applied to small RNA (miRNA and siRNA) sequencing. Fast and accurate matching in particular of short reads with diverse errors is therefore a pressing practical problem. We introduce a matching model for short reads that can, besides mismatches, also cope with indels. It addresses different error models. For example, it can handle the problem of leading and trailing contaminations caused by primers and poly-A tails in transcriptomics or the length-dependent increase of error rates. In these contexts, it thus simplifies the tedious and error-prone trimming step. For efficient searches, our method utilizes index structures in the form of enhanced suffix arrays. In a comparison with current methods for short read mapping, the presented approach shows significantly increased performance not only for 454 reads, but also for Illumina reads. Our approach is implemented in the software segemehl available at http://www.bioinf.uni-leipzig.de/Software/segemehl/.

Bioinformatics | 1999

REPuter: fast computation of maximal repeats in complete genomes.

Stefan Kurtz; Chris Schleiermacher

SUMMARY A software tool was implemented that computes exact repeats and palindromes in entire genomes very efficiently. AVAILABILITY Via the Bielefeld Bioinformatics Server (http://bibiserv.techfak.uni-bielefeld.de/rep uter/).

Software - Practice and Experience | 1999

Reducing the space requirement of suffix trees

Stefan Kurtz

We show that suffix trees store various kinds of redundant information. We exploit these redundancies to obtain more space efficient representations. The most space efficient of our representations requires 20 bytes per input character in the worst case, and 10.1 bytes per input character on average for a collection of 42 files of different type. This is an advantage of more than 8 bytes per input character over previous work. Our representations can be constructed without extra space, and as fast as previous representations. The asymptotic running times of suffix tree applications are retained. Copyright

BMC Bioinformatics | 2008

LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons

David Ellinghaus; Stefan Kurtz; Ute Willhoeft

BackgroundTransposable elements are abundant in eukaryotic genomes and it is believed that they have a significant impact on the evolution of gene and chromosome structure. While there are several completed eukaryotic genome projects, there are only few high quality genome wide annotations of transposable elements. Therefore, there is a considerable demand for computational identification of transposable elements. LTR retrotransposons, an important subclass of transposable elements, are well suited for computational identification, as they contain long terminal repeats (LTRs).ResultsWe have developed a software tool LTRharvest for the de novo detection of full length LTR retrotransposons in large sequence sets. LTRharvest efficiently delivers high quality annotations based on known LTR transposon features like length, distance, and sequence motifs. A quality validation of LTRharvest against a gold standard annotation for Saccharomyces cerevisae and Drosophila melanogaster shows a sensitivity of up to 90% and 97% and specificity of 100% and 72%, respectively. This is comparable or slightly better than annotations for previous software tools. The main advantage of LTRharvest over previous tools is (a) its ability to efficiently handle large datasets from finished or unfinished genome projects, (b) its flexibility in incorporating known sequence features into the prediction, and (c) its availability as an open source software.ConclusionLTRharvest is an efficient software tool delivering high quality annotation of LTR retrotransposons. It can, for example, process the largest human chromosome in approx. 8 minutes on a Linux PC with 4 GB of memory. Its flexibility and small space and run-time requirements makes LTRharvest a very competitive candidate for future LTR retrotransposon annotation projects. Moreover, the structured design and implementation and the availability as open source provides an excellent base for incorporating novel concepts to further improve prediction of LTR retrotransposons.

computational systems bioinformatics | 2003

Local similarity in RNA secondary structures

Matthias Höchsmann; Thomas Töller; Robert Giegerich; Stefan Kurtz

We present a systematic treatment of alignment distance and local similarity algorithms on trees and forests. We build upon the tree alignment algorithm for ordered trees given by Jiang et. al (1995) and extend it to calculate local forest alignments, which is essential for finding local similar regions in RNA secondary structures. The time complexity of our algorithm is O(/F/sub 1///spl middot//F/sub 2//)/spl middot/deg(F/sub 1/)/spl middot/deg(F/sub 2/)/spl middot/(deg(F/sub 1/)+deg(F/sub 2/)) where /Fi/ is the number of nodes in forest Fi and deg(Fi) is the degree of Fi. We provide carefully engineered dynamic programming implementations using dense, two-dimensional tables which considerably reduces the space requirement. We suggest a new representation of RNA secondary structures as forests that allow reasonable scoring of edit operations on RNA secondary structures. The comparison of RNA secondary structures is facilitated by a new visualization technique for RNA secondary structure alignments. Finally, we show how potential regulatory motifs can be discovered solely by their structural preservation, and independent of their sequence conservation and position.

American Journal of Pathology | 2012

Genomic Deletion of PTEN Is Associated with Tumor Progression and Early PSA Recurrence in ERG Fusion-Positive and Fusion-Negative Prostate Cancer

Antje Krohn; Tobias Diedler; Lia Burkhardt; Pascale Sophie Mayer; Colin De Silva; Marie Meyer-Kornblum; Darja Kötschau; Pierre Tennstedt; Joseph Huang; Clarissa Gerhäuser; Malte Mader; Stefan Kurtz; Hüseyin Sirma; Fred Saad; Thomas Steuber; Markus Graefen; Christoph Plass; Guido Sauter; Ronald Simon; Sarah Minner; Thorsten Schlomm

The phosphatase and tensin homolog deleted on chromosome 10 (PTEN) gene is often altered in prostate cancer. To determine the prevalence and clinical significance of the different mechanisms of PTEN inactivation, we analyzed PTEN deletions in TMAs containing 4699 hormone-naïve and 57 hormone-refractory prostate cancers using fluorescence in situ hybridization analysis. PTEN mutations and methylation were analyzed in subsets of 149 and 34 tumors, respectively. PTEN deletions were present in 20.2% (458/2266) of prostate cancers, including 8.1% heterozygous and 12.1% homozygous deletions, and were linked to advanced tumor stage (P < 0.0001), high Gleason grade (P < 0.0001), presence of lymph node metastasis (P = 0.0002), hormone-refractory disease (P < 0.0001), presence of ERG gene fusion (P < 0.0001), and nuclear p53 accumulation (P < 0.0001). PTEN deletions were also associated with early prostate-specific antigen recurrence in univariate (P < 0.0001) and multivariate (P = 0.0158) analyses. The prognostic impact of PTEN deletion was seen in both ERG fusion-positive and ERG fusion-negative tumors. PTEN mutations were found in 4 (12.9%) of 31 cancers with heterozygous PTEN deletions but in only 1 (2%) of 59 cancers without PTEN deletion (P = 0.027). Aberrant PTEN promoter methylation was not detected in 34 tumors. The results of this study demonstrate that biallelic PTEN inactivation, by either homozygous deletion or deletion of one allele and mutation of the other, occurs in most PTEN-defective cancers and characterizes a particularly aggressive subset of metastatic and hormone-refractory prostate cancers.

BMC Genomics | 2008

A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes

Stefan Kurtz; Apurva Narechania; Joshua C. Stein; Doreen Ware

BackgroundThe challenges of accurate gene prediction and enumeration are further aggravated in large genomes that contain highly repetitive transposable elements (TEs). Yet TEs play a substantial role in genome evolution and are themselves an important subject of study. Repeat annotation, based on counting occurrences of k-mers, has been previously used to distinguish TEs from low-copy genic regions; but currently available software solutions are impractical due to high memory requirements or specialization for specific user-tasks.ResultsHere we introduce the Tallymer software, a flexible and memory-efficient collection of programs for k-mer counting and indexing of large sequence sets. Unlike previous methods, Tallymer is based on enhanced suffix arrays. This gives a much larger flexibility concerning the choice of the k-mer size. Tallymer can process large data sizes of several billion bases. We used it in a variety of applications to study the genomes of maize and other plant species. In particular, Tallymer was used to index a set of whole genome shotgun sequences from maize (B73) (total size 109 bp.). We analyzed k-mer frequencies for a wide range of k. At this low genome coverage (≈ 0.45×) highly repetitive 20-mers constituted 44% of the genome but represented only 1% of all possible k-mers. Similar low-complexity was seen in the repeat fractions of sorghum and rice. When applying our method to other maize data sets, High-C0t derived sequences showed the greatest enrichment for low-copy sequences. Among annotated TEs, the most highly repetitive were of the Ty3/gypsy class of retrotransposons, followed by the Ty1/copia class, and DNA transposons. Among expressed sequence tags (EST), a notable fraction contained high-copy k-mers, suggesting that transposons are still active in maize. Retrotransposons in Mo17 and McC cultivars were readily detected using the B73 20-mer frequency index, indicating their conservation despite extensive rearrangement across cultivars. Among one hundred annotated bacterial artificial chromosomes (BACs), k-mer frequency could be used to detect transposon-encoded genes with 92% sensitivity, compared to 96% using alignment-based repeat masking, while both methods showed 92% specificity.ConclusionThe Tallymer software was effective in a variety of applications to aid genome annotation in maize, despite limitations imposed by the relatively low coverage of sequence available. For more information on the software, see http://www.zbh.uni-hamburg.de/Tallymer.

Algorithmica | 1997

From Ukkonen to McCreight and Weiner: A Unifying View of Linear-Time Suffix Tree Construction

Robert Giegerich; Stefan Kurtz

Abstract. We review the linear-time suffix tree constructions by Weiner, McCreight, and Ukkonen. We use the terminology of the most recent algorithm, Ukkonens on-line construction, to explain its historic predecessors. This reveals relationships much closer than one would expect, since the three algorithms are based on rather different intuitive ideas. Moreover, it completely explains the differences between these algorithms in terms of simplicity, efficiency, and implementation complexity.

Explore More