Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jens Keilwagen is active.

Publication


Featured researches published by Jens Keilwagen.


Nature Biotechnology | 2013

Evaluation of methods for modeling transcription factor sequence specificity

Matthew T. Weirauch; Raquel Norel; Matti Annala; Yue Zhao; Todd Riley; Julio Saez-Rodriguez; Thomas Cokelaer; Anastasia Vedenko; Shaheynoor Talukder; Phaedra Agius; Aaron Arvey; Philipp Bucher; Curtis G. Callan; Cheng Wei Chang; Chien-Yu Chen; Yong-Syuan Chen; Yu-Wei Chu; Jan Grau; Ivo Grosse; Vidhya Jagannathan; Jens Keilwagen; Szymon M. Kiełbasa; Justin B. Kinney; Holger Klein; Miron B. Kursa; Harri Lähdesmäki; Kirsti Laurila; Chengwei Lei; Christina S. Leslie; Chaim Linhart

Genomic analyses often involve scanning for potential transcription factor (TF) binding sites using models of the sequence specificity of DNA binding proteins. Many approaches have been developed to model and learn a proteins DNA-binding specificity, but these methods have not been systematically compared. Here we applied 26 such approaches to in vitro protein binding microarray data for 66 mouse TFs belonging to various families. For nine TFs, we also scored the resulting motif models on in vivo data, and found that the best in vitro–derived motifs performed similarly to motifs derived from the in vivo data. Our results indicate that simple models based on mononucleotide position weight matrices trained by the best methods perform similarly to more complex models for most TFs examined, but fall short in specific cases (<10% of the TFs examined here). In addition, the best-performing motifs typically have relatively low information content, consistent with widespread degeneracy in eukaryotic TF sequence preferences.


Nucleic Acids Research | 2012

Toward the identification and regulation of the Arabidopsis thaliana ABI3 regulon.

Gudrun Mönke; Michael Seifert; Jens Keilwagen; Michaela Mohr; Ivo Grosse; Urs Hähnel; Astrid Junker; Bernd Weisshaar; Udo Conrad; Helmut Bäumlein; Lothar Altschmied

The plant-specific, B3 domain-containing transcription factor ABSCISIC ACID INSENSITIVE3 (ABI3) is an essential component of the regulatory network controlling the development and maturation of the Arabidopsis thaliana seed. Genome-wide chromatin immunoprecipitation (ChIP-chip), transcriptome analysis, quantitative reverse transcriptase–polymerase chain reaction and a transient promoter activation assay have been combined to identify a set of 98 ABI3 target genes. Most of these presumptive ABI3 targets require the presence of abscisic acid for their activation and are specifically expressed during seed maturation. ABI3 target promoters are enriched for G-box-like and RY-like elements. The general occurrence of these cis motifs in non-ABI3 target promoters suggests the existence of as yet unidentified regulatory signals, some of which may be associated with epigenetic control. Several members of the ABI3 regulon are also regulated by other transcription factors, including the seed-specific, B3 domain-containing FUS3 and LEC2. The data strengthen and extend the notion that ABI3 is essential for the protection of embryonic structures from desiccation and raise pertinent questions regarding the specificity of promoter recognition.


Plant Journal | 2012

Elongation‐related functions of LEAFY COTYLEDON1 during the development of Arabidopsis thaliana

Astrid Junker; Gudrun Mönke; Twan Rutten; Jens Keilwagen; Michael Seifert; Tuyet Minh Nguyen Thi; Jean-Pierre Renou; Sandrine Balzergue; Prisca Viehöver; Urs Hähnel; Jutta Ludwig-Müller; Lothar Altschmied; Udo Conrad; Bernd Weisshaar; Helmut Bäumlein

The transcription factor LEAFY COTYLEDON1 (LEC1) controls aspects of early embryogenesis and seed maturation in Arabidopsis thaliana. To identify components of the LEC1 regulon, transgenic plants were derived in which LEC1 expression was inducible by dexamethasone treatment. The cotyledon-like leaves and swollen root tips developed by these plants contained seed-storage compounds and resemble the phenotypes produced by increased auxin levels. In agreement with this, LEC1 was found to mediate up-regulation of the auxin synthesis gene YUCCA10. Auxin accumulated primarily in the elongation zone at the root-hypocotyl junction (collet). This accumulation correlates with hypocotyl growth, which is either inhibited in LEC1-induced embryonic seedlings or stimulated in the LEC1-induced long-hypocotyl phenotype, therefore resembling etiolated seedlings. Chromatin immunoprecipitation analysis revealed a number of phytohormone- and elongation-related genes among the putative LEC1 target genes. LEC1 appears to be an integrator of various regulatory events, involving the transcription factor itself as well as light and hormone signalling, especially during somatic and early zygotic embryogenesis. Furthermore, the data suggest non-embryonic functions for LEC1 during post-germinative etiolation.


PLOS Computational Biology | 2011

De-Novo Discovery of Differentially Abundant Transcription Factor Binding Sites Including Their Positional Preference

Jens Keilwagen; Jan Grau; Ivan A. Paponov; Stefan Posch; Marc Strickert; Ivo Grosse

Transcription factors are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in promoters. The de-novo discovery of transcription factor binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not been fully solved yet. Here, we present a de-novo motif discovery tool called Dispom for finding differentially abundant transcription factor binding sites that models existing positional preferences of binding sites and adjusts the length of the motif in the learning process. Evaluating Dispom, we find that its prediction performance is superior to existing tools for de-novo motif discovery for 18 benchmark data sets with planted binding sites, and for a metazoan compendium based on experimental data from micro-array, ChIP-chip, ChIP-DSL, and DamID as well as Gene Ontology data. Finally, we apply Dispom to find binding sites differentially abundant in promoters of auxin-responsive genes extracted from Arabidopsis thaliana microarray data, and we find a motif that can be interpreted as a refined auxin responsive element predominately positioned in the 250-bp region upstream of the transcription start site. Using an independent data set of auxin-responsive genes, we find in genome-wide predictions that the refined motif is more specific for auxin-responsive genes than the canonical auxin-responsive element. In general, Dispom can be used to find differentially abundant motifs in sequences of any origin. However, the positional distribution learned by Dispom is especially beneficial if all sequences are aligned to some anchor point like the transcription start site in case of promoter sequences. We demonstrate that the combination of searching for differentially abundant motifs and inferring a position distribution from the data is beneficial for de-novo motif discovery. Hence, we make the tool freely available as a component of the open-source Java framework Jstacs and as a stand-alone application at http://www.jstacs.de/index.php/Dispom.


artificial neural networks in pattern recognition | 2008

Discriminatory Data Mapping by Matrix-Based Supervised Learning Metrics

Marc Strickert; Petra Schneider; Jens Keilwagen; Thomas Villmann; Michael Biehl; Barbara Hammer

Supervised attribute relevance detection using cross-comparisons (SARDUX), a recently proposed method for data-driven metric learning, is extended from dimension-weighted Minkowski distances to metrics induced by a data transformation matrix i¾?for modeling mutual attribute dependence. Given class labels, parameters of i¾?are adapted in such a manner that the inter-class distances are maximized, while the intra-class distances get minimized. This results in an approach similar to Fishers linear discriminant analysis (LDA), however, the involved distance matrix gets optimized, and it can be finally utilized for generating discriminatory data mappings that outperform projection pursuit methods with LDA index. The power of matrix-based metric optimization is demonstrated for spectrum data and for cancer gene expression data.


Nucleic Acids Research | 2013

A general approach for discriminative de novo motif discovery from high-throughput data

Jan Grau; Stefan Posch; Ivo Grosse; Jens Keilwagen

De novo motif discovery has been an important challenge of bioinformatics for the past two decades. Since the emergence of high-throughput techniques like ChIP-seq, ChIP-exo and protein-binding microarrays (PBMs), the focus of de novo motif discovery has shifted to runtime and accuracy on large data sets. For this purpose, specialized algorithms have been designed for discovering motifs in ChIP-seq or PBM data. However, none of the existing approaches work perfectly for all three high-throughput techniques. In this article, we propose Dimont, a general approach for fast and accurate de novo motif discovery from high-throughput data. We demonstrate that Dimont yields a higher number of correct motifs from ChIP-seq data than any of the specialized approaches and achieves a higher accuracy for predicting PBM intensities from probe sequence than any of the approaches specifically designed for that purpose. Dimont also reports the expected motifs for several ChIP-exo data sets. Investigating differences between in vitro and in vivo binding, we find that for most transcription factors, the motifs discovered by Dimont are in good accordance between techniques, but we also find notable exceptions. We also observe that modeling intra-motif dependencies may increase accuracy, which indicates that more complex motif models are a worthwhile field of research.


PLOS ONE | 2014

Area under precision-recall curves for weighted and unweighted data.

Jens Keilwagen; Ivo Grosse; Jan Grau

Precision-recall curves are highly informative about the performance of binary classifiers, and the area under these curves is a popular scalar performance measure for comparing different classifiers. However, for many applications class labels are not provided with absolute certainty, but with some degree of confidence, often reflected by weights or soft labels assigned to data points. Computing the area under the precision-recall curve requires interpolating between adjacent supporting points, but previous interpolation schemes are not directly applicable to weighted data. Hence, even in cases where weights were available, they had to be neglected for assessing classifiers using precision-recall curves. Here, we propose an interpolation for precision-recall curves that can also be used for weighted data, and we derive conditions for classification scores yielding the maximum and minimum area under the precision-recall curve. We investigate accordances and differences of the proposed interpolation and previous ones, and we demonstrate that taking into account existing weights of test data is important for the comparison of classifiers.


Bioinformatics | 2015

PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R

Jan Grau; Ivo Grosse; Jens Keilwagen

Summary: Precision-recall (PR) and receiver operating characteristic (ROC) curves are valuable measures of classifier performance. Here, we present the R-package PRROC, which allows for computing and visualizing both PR and ROC curves. In contrast to available R-packages, PRROC allows for computing PR and ROC curves and areas under these curves for soft-labeled data using a continuous interpolation between the points of PR curves. In addition, PRROC provides a generic plot function for generating publication-quality graphics of PR and ROC curves. Availability and implementation: PRROC is available from CRAN and is licensed under GPL 3. Contact: [email protected]


Scientific Reports | 2015

Separating the wheat from the chaff – a strategy to utilize plant genetic resources from ex situ genebanks

Jens Keilwagen; Benjamin Kilian; Hakan Özkan; Steve Babben; Dragan Perovic; Klaus F. X. Mayer; Alexander Walther; C. Hart Poskar; Frank Ordon; Kellye Eversole; A. Börner; Martin W. Ganal; H. Knüpffer; Andreas Graner; Swetlana Friedel

The need for higher yielding and better-adapted crop plants for feeding the worlds rapidly growing population has raised the question of how to systematically utilize large genebank collections with their wide range of largely untouched genetic diversity. Phenotypic data that has been recorded for decades during various rounds of seed multiplication provides a rich source of information. Their usefulness has remained limited though, due to various biases induced by conservation management over time or changing environmental conditions. Here, we present a powerful procedure that permits an unbiased trait-based selection of plant samples based on such phenotypic data. Applying this technique to the wheat collection of one of the largest genebanks worldwide, we identified groups of plant samples displaying contrasting phenotypes for selected traits. As a proof of concept for our discovery pipeline, we resequenced the entire major but conserved flowering time locus Ppd-D1 in just a few such selected wheat samples – and nearly doubled the number of hitherto known alleles.


Discrete Applied Mathematics | 2014

Exact algorithms and heuristics for the Quadratic Traveling Salesman Problem with an application in bioinformatics

Anja Fischer; Frank Fischer; Gerold Jäger; Jens Keilwagen; Paul Molitor; Ivo Grosse

In this paper we introduce an extension of the Traveling Salesman Problem (TSP), which is motivated by an important application in bioinformatics. In contrast to the TSP the costs do not only depend on each pair of two nodes traversed in succession in a cycle but on each triple of nodes traversed in succession. This problem can be formulated as optimizing a quadratic objective function over the traveling salesman polytope, so we call the combinatorial optimization problem quadratic TSP (QTSP). Besides its application in bioinformatics, the QTSP is a generalization of the Angular-Metric TSP and the TSP with reload costs. Apart from the TSP with quadratic cost structure we also consider the related Cycle Cover Problem with quadratic objective function (QCCP). In this work we present three exact solution approaches and several heuristics for the QTSP. The first exact approach is based on a polynomial transformation to a TSP, which is then solved by standard software. The second one is a branch-and-bound algorithm that relies on combinatorial bounds. The best exact algorithm is a branch-and-cut approach based on an integer programming formulation with problem-specific cutting planes. All heuristical approaches are extensions of classic heuristics for the TSP. Finally, we compare all algorithms on real-world instances from bioinformatics and on randomly generated instances. In these tests, the branch-and-cut approach turned out to be superior for solving the real-world instances from bioinformatics. Instances with up to 100 nodes could be solved to optimality in about ten minutes.

Collaboration


Dive into the Jens Keilwagen's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge