Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Robert Giegerich is active.

Publication


Featured researches published by Robert Giegerich.


BMC Bioinformatics | 2004

A comprehensive comparison of comparative RNA structure prediction approaches

Paul P. Gardner; Robert Giegerich

BackgroundAn increasing number of researchers have released novel RNA structure analysis and prediction algorithms for comparative approaches to structure prediction. Yet, independent benchmarking of these algorithms is rarely performed as is now common practice for protein-folding, gene-finding and multiple-sequence-alignment algorithms.ResultsHere we evaluate a number of RNA folding algorithms using reliable RNA data-sets and compare their relative performance.ConclusionsWe conclude that comparative data can enhance structure prediction but structure-prediction-algorithms vary widely in terms of both sensitivity and selectivity across different lengths and homologies. Furthermore, we outline some directions for future research.


BMC Bioinformatics | 2004

Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics

Jens Reeder; Robert Giegerich

BackgroundThe general problem of RNA secondary structure prediction under the widely used thermodynamic model is known to be NP-complete when the structures considered include arbitrary pseudoknots. For restricted classes of pseudoknots, several polynomial time algorithms have been designed, where the O(n6)time and O(n4) space algorithm by Rivas and Eddy is currently the best available program.ResultsWe introduce the class of canonical simple recursive pseudoknots and present an algorithm that requires O(n4) time and O(n2) space to predict the energetically optimal structure of an RNA sequence, possible containing such pseudoknots. Evaluation against a large collection of known pseudoknotted structures shows the adequacy of the canonization approach and our algorithm.ConclusionsRNA pseudoknots of medium size can now be predicted reliably as well as efficiently by the new algorithm.


Bioinformatics | 2006

RNAshapes: an integrated RNA analysis package based on abstract shapes

Peter Steffen; Björn Voß; Marc Rehmsmeier; Jens Reeder; Robert Giegerich

We introduce RNAshapes, a new software package that integrates three RNA analysis tools based on the abstract shapes approach: the analysis of shape representatives, the calculation of shape probabilities and the consensus shapes approach. This new package is completely reimplemented in C and outruns the original implementations significantly in runtime and memory requirements. Additionally, we added a number of useful features like suboptimal folding with correct dangling energies, structure graph output, shape matching and a sliding window approach.


computational systems bioinformatics | 2003

Local similarity in RNA secondary structures

Matthias Höchsmann; Thomas Töller; Robert Giegerich; Stefan Kurtz

We present a systematic treatment of alignment distance and local similarity algorithms on trees and forests. We build upon the tree alignment algorithm for ordered trees given by Jiang et. al (1995) and extend it to calculate local forest alignments, which is essential for finding local similar regions in RNA secondary structures. The time complexity of our algorithm is O(/F/sub 1///spl middot//F/sub 2//)/spl middot/deg(F/sub 1/)/spl middot/deg(F/sub 2/)/spl middot/(deg(F/sub 1/)+deg(F/sub 2/)) where /Fi/ is the number of nodes in forest Fi and deg(Fi) is the degree of Fi. We provide carefully engineered dynamic programming implementations using dense, two-dimensional tables which considerably reduces the space requirement. We suggest a new representation of RNA secondary structures as forests that allow reasonable scoring of edit operations on RNA secondary structures. The comparison of RNA secondary structures is facilitated by a new visualization technique for RNA secondary structure alignments. Finally, we show how potential regulatory motifs can be discovered solely by their structural preservation, and independent of their sequence conservation and position.


Algorithmica | 1997

From Ukkonen to McCreight and Weiner: A Unifying View of Linear-Time Suffix Tree Construction

Robert Giegerich; Stefan Kurtz

Abstract. We review the linear-time suffix tree constructions by Weiner, McCreight, and Ukkonen. We use the terminology of the most recent algorithm, Ukkonens on-line construction, to explain its historic predecessors. This reveals relationships much closer than one would expect, since the three algorithms are based on rather different intuitive ideas. Moreover, it completely explains the differences between these algorithms in terms of simplicity, efficiency, and implementation complexity.


IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2004

Pure Multiple RNA Secondary Structure Alignments: A Progressive Profile Approach

Matthias Höchsmann; Björn Voss; Robert Giegerich

In functional, noncoding RNA, structure is often essential to function. While the full 3D structure is very difficult to determine, the 2D structure of an RNA molecule gives good clues to its 3D structure, and for molecules of moderate length, it can be predicted with good reliability. Structure comparison is, in analogy to sequence comparison, the essential technique to infer related function. We provide a method for computing multiple alignments of RNA secondary structures under the tree alignment model, which is suitable to cluster RNA molecules purely on the structural level, i.e., sequence similarity is not required. We give a systematic generalization of the profile alignment method from strings to trees and forests. We introduce a tree profile representation of RNA secondary structure alignments which allows reasonable scoring in structure comparison. Besides the technical aspects, an RNA profile is a useful data structure to represent multiple structures of RNA sequences. Moreover, we propose a visualization of RNA consensus structures that is enriched by the full sequence information.


Monatshefte Fur Chemie | 1996

Analysis of RNA sequence structure maps by exhaustive enumeration I. Neutral networks

Walter Grüner; Robert Giegerich; Dirk Strothmann; Christian M. Reidys; Jacqueline Weber; Ivo L. Hofacker; Peter F. Stadler; Peter Schuster

SummaryGlobal relations between RNA sequences and secondary structures are understood as mappings from sequence space into shape space. These mappings are investigated by exhaustive folding of allGC andAU sequences with chain lengths up to 30. The computed structural data are evaluated through exhaustive enumeration and used as an exact reference for testing analytical results derived from mathematical models and sampling based on statistical methods. Several new concepts of RNA sequence to secondary structure mappings are investigated, among them that ofneutral networks (being sets of sequences folding into the same structure). Exhaustive enumeration allows to test several previously suggested relations: the number of (minimum free energy) secondary structures as a function of the chain length as well as the frequency distribution of structures at constant chain length (commonly resulting in generalized forms ofZipfs law).ZusammenfassungDie globalen Benziehungen zwischen RNA-Sequenzen und Sekundärstrukturen werden als Abbildungen aus einem Raum aller Sequenzen in einen Raum aller Strukturen aufgefaßt. Diese Abbildungen werden durch Falten aller binären Sequenzen desGC-undAU-Alphabets mit Kettenlängen bis zun=30 untersucht. Die berechneten Strukturdaten werden durch vollständiges Abzählen ausgewertet und als eine exakte Referenz zum Überprüfen analytischer Resultate aus mathematischen Modellen sowie zum Testen statistisch erhobener Proben verwendet. Einige neuartige Konzepte zur Beschreibung der Beziehungen zwischen Sequenzen und Strukturen werden eingehend untersucht, unter ihnen der Begriff derneutralen Netzwerke. Ein neutrales Netzwerk besteht aus allen Sequenzen, die eine bestimmte Struktur ausbilden. Vollständiges Abzählen ermöglicht beispielsweise die Bestimmung aller Strukturen minimaler freier Energie in Abhängigkeit von der Kettenlänge ebenso wie die Bestimmung der Häufigkeitsverteilungen der Strukturen bei konstanten Kettenlängen. Die letzteren folgen einer verallgemeinerten FormZipfschen Gesetzes.


Monatshefte Fur Chemie | 1996

Analysis of RNA sequence structure maps by exhaustive enumeration II. Structures of neutral networks and shape space covering

Walter Grüner; Robert Giegerich; Dirk Strothmann; Christian M. Reidys; Jacqueline Weber; Ivo L. Hofacker; Peter F. Stadler; Peter Schuster

SummaryThe relations between RNA sequences and secondary structures are investigated by exhaustive folding of allGC andAU sequences with chain lengths up to 30. The technique oftries is used for economic data storage and fast retrieval of information. The computed structural data are evaluated through exhaustive enumeration and used as an exact reference for testing analytical results derived from mathematical models and sampling based on statistical methods. Several new concepts of RNA sequence to secondary structure mappings are investigated, among them the structure ofneutral networks (being sets of RNA sequences folding into the same structure),percolation of sequence space by neutral networks, and the principle ofshape space covering. The data of exhaustive enumeration are compared to the analytical results of arandom graph model that reveals the generic properties of sequence to structure mappings based on some base pairing logic. The differences between the numerical and the analytical results are interpreted in terms of specific biophysical properties of RNA molecules.ZusammenfassungDie Beziehungen zwischen RNA-Sequenzen und ihren Sekundärstrukturen werden durch vollständiges Falten allerGC- undAU-Sequenzen mit Kettenlängen bis zun=30 untersucht. Die aus der Informatik bekannte Technik derTries wird zur ökonomischen Datenspeicherung und für rasches Retrieval der gespeicherten Information angewendet. Die berechneten Strukturdaten werden durch vollständiges Abzählen ausgewertet. Sie dienen unter anderem als eine exakte Referenz zum Testen analytischer Resultate aus mathematischen Modellen sowie zur Überprüfung der Ergebnisse statistischer Probennahmen. Verschiedene neuartige Konzepte zur Behandlung der Zusammenhänge zwischen RNA-Sequenzen und Sekundärstrukturen wurden anhand der gewonnenen Daten eingehend untersucht. Unter ihnen befinden sich die Struktur derneutralen Netzwerke (die Gesamtheit der RNA-Sequenzen, die eine bestimmte Struktur ausbilden), diePerkolation des Sequenzraumes durch neutrale Netzwerke sowie das Prinzip derErfassung des Strukturraumes durch einen kleinen Ausschnitt des Sequenzraumes. Die durch vollständiges Abzählen erhaltenen Daten werden mit den analytischen Ergebnissen eines auf der Theorie der Zufallsgraphen aufbauenden Modells verglichen. Dieses Modell gibt die generischen Eigenschaften von Sequenz-Struktur-Relationen wieder, welche lediglich aus der Existenz einer Paarungslogik resultieren. Differenzen zwischen den numerischen und den analytischen Resultaten können als Konsequenzen der spezifischen biophysikalischen Eigenschaften von RNA-Molekülen interpretiert werden.


Software - Practice and Experience | 2003

Efficient implementation of lazy suffix trees

Robert Giegerich; Stefan Kurtz; Jens Stoye

We present an efficient implementation of a write‐only top‐down construction for suffix trees. Our implementation is based on a new, space‐efficient representation of suffix trees that requires only 12 bytes per input character in the worst case, and 8.5 bytes per input character on average for a collection of files of different type. We show how to efficiently implement the lazy evaluation of suffix trees such that a subtree is evaluated only when it is traversed for the first time. Our experiments show that for the problem of searching many exact patterns in a fixed input string, the lazy top‐down construction is often faster and more space efficient than other methods. Copyright


BMC Bioinformatics | 2006

Fast index based algorithms and software for matching position specific scoring matrices

Michael Beckstette; Robert Homann; Robert Giegerich; Stefan Kurtz

BackgroundIn biological sequence analysis, position specific scoring matrices (PSSMs) are widely used to represent sequence motifs in nucleotide as well as amino acid sequences. Searching with PSSMs in complete genomes or large sequence databases is a common, but computationally expensive task.ResultsWe present a new non-heuristic algorithm, called ESAsearch, to efficiently find matches of PSSMs in large databases. Our approach preprocesses the search space, e.g., a complete genome or a set of protein sequences, and builds an enhanced suffix array that is stored on file. This allows the searching of a database with a PSSM in sublinear expected time. Since ESAsearch benefits from small alphabets, we present a variant operating on sequences recoded according to a reduced alphabet. We also address the problem of non-comparable PSSM-scores by developing a method which allows the efficient computation of a matrix similarity threshold for a PSSM, given an E-value or a p-value. Our method is based on dynamic programming and, in contrast to other methods, it employs lazy evaluation of the dynamic programming matrix. We evaluated algorithm ESAsearch with nucleotide PSSMs and with amino acid PSSMs. Compared to the best previous methods, ESAsearch shows speedups of a factor between 17 and 275 for nucleotide PSSMs, and speedups up to factor 1.8 for amino acid PSSMs. Comparisons with the most widely used programs even show speedups by a factor of at least 3.8. Alphabet reduction yields an additional speedup factor of 2 on amino acid sequences compared to results achieved with the 20 symbol standard alphabet. The lazy evaluation method is also much faster than previous methods, with speedups of a factor between 3 and 330.ConclusionOur analysis of ESAsearch reveals sublinear runtime in the expected case, and linear runtime in the worst case for sequences not shorter than |AMathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFaeFqaaa@3821@|m + m - 1, where m is the length of the PSSM and AMathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFaeFqaaa@3821@ a finite alphabet. In practice, ESAsearch shows superior performance over the most widely used programs, especially for DNA sequences. The new algorithm for accurate on-the-fly calculations of thresholds has the potential to replace formerly used approximation approaches. Beyond the algorithmic contributions, we provide a robust, well documented, and easy to use software package, implementing the ideas and algorithms presented in this manuscript.

Collaboration


Dive into the Robert Giegerich's collaboration.

Researchain Logo
Decentralizing Knowledge