Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Florian Schmidt is active.

Publication


Featured researches published by Florian Schmidt.


Nucleic Acids Research | 2017

Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction

Florian Schmidt; Nina Gasparoni; Gilles Gasparoni; Kathrin Gianmoena; Cristina Cadenas; Julia K. Polansky; Peter Ebert; Karl Nordström; Matthias Barann; Anupam Sinha; Sebastian Fröhler; Jieyi Xiong; Azim Dehghani Amirabad; Fatemeh Behjati Ardakani; Barbara Hutter; Gideon Zipprich; Bärbel Felder; Jürgen Eils; Benedikt Brors; Wei Chen; Jan G. Hengstler; Alf Hamann; Thomas Lengauer; Philip Rosenstiel; Jörn Walter; Marcel H. Schulz

The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively.


Nucleic Acids Research | 2017

RegulatorTrail: a web service for the identification of key transcriptional regulators

Tim Kehl; Lara Schneider; Florian Schmidt; Daniel Stöckel; Nico Gerstner; Christina Backes; Eckart Meese; Andreas Keller; Marcel H. Schulz; Hans-Peter Lenhof

Abstract Transcriptional regulators such as transcription factors and chromatin modifiers play a central role in most biological processes. Alterations in their activities have been observed in many diseases, e.g. cancer. Hence, it is of utmost importance to evaluate and assess the effects of transcriptional regulators on natural and pathogenic processes. Here, we present RegulatorTrail, a web service that provides rich functionality for the identification and prioritization of key transcriptional regulators that have a strong impact on, e.g. pathological processes. RegulatorTrail offers eight methods that use regulator binding information in combination with transcriptomic or epigenomic data to infer the most influential regulators. Our web service not only provides an intuitive web interface, but also a well-documented RESTful API that allows for a straightforward integration into third-party workflows. The presented case studies highlight the capabilities of our web service and demonstrate its potential for the identification of influential regulators: we successfully identified regulators that might explain the increased malignancy in metastatic melanoma compared to primary tumors, as well as important regulators in macrophages. RegulatorTrail is freely accessible at: https://regulatortrail.bioinf.uni-sb.de/.


bioRxiv | 2017

Temporal epigenomic profiling identifies AHR as dynamic super-enhancer controlled regulator of mesenchymal multipotency

Déborah Gerard; Florian Schmidt; Aurélien Ginolhac; Martine Schmitz; Rashi Halder; Peter Ebert; Marcel H. Schulz; Thomas Sauter; Lasse Sinkkonen

Temporal data on gene expression and context-specific open chromatin states can improve identification of key transcription factors (TFs) and the gene regulatory networks (GRNs) controlling cellular differentiation. However, their integration remains challenging. Here, we delineate a general approach for data-driven and unbiased identification of key TFs and dynamic GRNs, called EPIC-DREM. We generated time-series transcriptomic and epigenomic profiles during differentiation of mouse multipotent bone marrow stromal cells (MSCs) towards adipocytes and osteoblasts. Using our novel approach we constructed time-resolved GRNs for both lineages. To prioritize the identified shared regulators, we mapped dynamic super-enhancers in both lineages and associated them to target genes with correlated expression profiles. We identified aryl hydrocarbon receptor (AHR) as a mesenchymal key TF controlled by a dynamic cluster of MSC-specific super-enhancers that become repressed in both lineages. AHR represses differentiation-induced genes such as Notch3 and we propose AHR to function as a guardian of mesenchymal multipotency.


F1000Research | 2018

Predicting transcription factor binding using ensemble random forest models

Fatemeh Behjati Ardakani; Florian Schmidt; Marcel H. Schulz

Background: Understanding the location and cell-type specific binding of Transcription Factors (TFs) is important in the study of gene regulation. Computational prediction of TF binding sites is challenging, because TFs often bind only to short DNA motifs and cell-type specific co-factors may work together with the same TF to determine binding. Here, we consider the problem of learning a general model for the prediction of TF binding using DNase1-seq data and TF motif description in form of position specific energy matrices (PSEMs). Methods: We use TF ChIP-seq data as a gold-standard for model training and evaluation. Our contribution is a novel ensemble learning approach using random forest classifiers. In the context of the ENCODE-DREAM in vivo TF binding site prediction challenge we consider different learning setups. Results: Our results indicate that the ensemble learning approach is able to better generalize across tissues and cell-types compared to individual tissue-specific classifiers or a classifier applied to the data aggregated across tissues. Furthermore, we show that incorporating DNase1-seq peaks is essential to reduce the false positive rate of TF binding predictions compared to considering the raw DNase1 signal. Conclusions: Analysis of important features reveals that the models preferentially select motifs of other TFs that are close interaction partners in existing protein protein-interaction networks. Code generated in the scope of this project is available on GitHub: https://github.com/SchulzLab/TFAnalysis (DOI: 10.5281/zenodo.1409697).


Bioinformatics | 2018

On the problem of confounders in modeling gene expression

Florian Schmidt; Marcel H. Schulz

Motivation: Modeling of Transcription Factor (TF) binding from both ChIP‐seq and chromatin accessibility data has become prevalent in computational biology. Several models have been proposed to generate new hypotheses on transcriptional regulation. However, there is no distinct approach to derive TF binding scores from ChIP‐seq and open chromatin experiments. Here, we review biases of various scoring approaches and their effects on the interpretation and reliability of predictive gene expression models. Results: We generated predictive models for gene expression using ChIP‐seq and DNase1‐seq data from DEEP and ENCODE. Via randomization experiments, we identified confounders in TF gene scores derived from both ChIP‐seq and DNase1‐seq data. We reviewed correction approaches for both data types, which reduced the influence of identified confounders without harm to model performance. Also, our analyses highlighted further quality control measures, in addition to model performance, that may help to assure model reliability and to avoid misinterpretation in future studies. Availability and implementation: The software used in this study is available online at https://github.com/SchulzLab/TEPIC. Supplementary information: Supplementary data are available at Bioinformatics online.


Bioinformatics | 2018

An ontology-based method for assessing batch effect adjustment approaches in heterogeneous datasets

Florian Schmidt; Markus List; Engin Cukuroglu; Sebastian Köhler; Jonathan Göke; Marcel H. Schulz

Motivation International consortia such as the Genotype‐Tissue Expression (GTEx) project, The Cancer Genome Atlas (TCGA) or the International Human Epigenetics Consortium (IHEC) have produced a wealth of genomic datasets with the goal of advancing our understanding of cell differentiation and disease mechanisms. However, utilizing all of these data effectively through integrative analysis is hampered by batch effects, large cell type heterogeneity and low replicate numbers. To study if batch effects across datasets can be observed and adjusted for, we analyze RNA‐seq data of 215 samples from ENCODE, Roadmap, BLUEPRINT and DEEP as well as 1336 samples from GTEx and TCGA. While batch effects are a considerable issue, it is non‐trivial to determine if batch adjustment leads to an improvement in data quality, especially in cases of low replicate numbers. Results We present a novel method for assessing the performance of batch effect adjustment methods on heterogeneous data. Our method borrows information from the Cell Ontology to establish if batch adjustment leads to a better agreement between observed pairwise similarity and similarity of cell types inferred from the ontology. A comparison of state‐of‐the art batch effect adjustment methods suggests that batch effects in heterogeneous datasets with low replicate numbers cannot be adequately adjusted. Better methods need to be developed, which can be assessed objectively in the framework presented here. Availability and implementation Our method is available online at https://github.com/SchulzLab/OntologyEval. Supplementary information Supplementary data are available at Bioinformatics online.


Bioinformatics | 2018

TEPIC 2 - An extended framework for transcription factor binding prediction and integrative epigenomic analysis

Florian Schmidt; Fabian Kern; Peter Ebert; Nina Baumgarten; Marcel H. Schulz

Abstract Summary Prediction of transcription factor (TF) binding from epigenetics data and integrative analysis thereof are challenging. Here, we present TEPIC 2 a framework allowing for fast, accurate and versatile prediction, and analysis of TF binding from epigenetics data: it supports 30 species with binding motifs, computes TF gene and scores up to two orders of magnitude faster than before due to improved implementation, and offers easy-to-use machine learning pipelines for integrated analysis of TF binding predictions with gene expression data allowing the identification of important TFs. Availability and implementation TEPIC is implemented in C++, R, and Python. It is freely available at https://github.com/SchulzLab/TEPIC and can be used on Linux based systems. Supplementary information Supplementary data are available at Bioinformatics online.


F1000Research | 2015

CausalTrail: Testing hypothesis using causal Bayesian networks

Daniel Stöckel; Florian Schmidt; Patrick Trampert; Hans-Peter Lenhof

Summary Causal Bayesian Networks are a special class of Bayesian networks in which the hierarchy directly encodes the causal relationships between the variables. This allows to compute the effect of interventions, which are external changes to the system, caused by e.g. gene knockouts or an administered drug. Whereas numerous packages for constructing causal Bayesian networks are available, hardly any program targeted at downstream analysis exists. In this paper we present CausalTrail, a tool for performing reasoning on causal Bayesian networks using the do-calculus. CausalTrails features include multiple data import methods, a flexible query language for formulating hypotheses, as well as an intuitive graphical user interface. The program is able to account for missing data and thus can be readily applied in multi-omics settings where it is common that not all measurements are performed for all samples. Availability and Implementation CausalTrail is implemented in C++ using the Boost and Qt5 libraries. It can be obtained from https://github.com/dstoeckel/causaltrail


Immunity | 2016

Epigenomic profiling of human CD4+ T cells supports a linear differentiation model and highlights molecular regulators of memory development

Pawel Durek; Karl Nordström; Gilles Gasparoni; Abdulrahman Salhab; Christopher Kressler; Melanie de Almeida; Kevin Bassler; Thomas Ulas; Florian Schmidt; Jieyi Xiong; Petar Glažar; Filippos Klironomos; Anupam Sinha; Sarah Kinkley; Xinyi Yang; Laura Arrigoni; Azim Dehghani Amirabad; Fatemeh Behjati Ardakani; Lars Feuerbach; Oliver Gorka; Peter Ebert; Fabian Müller; Na Li; Stefan Frischbutter; Stephan Schlickeiser; Carla Cendon; Sebastian Fröhler; Bärbel Felder; Nina Gasparoni; Charles D. Imbusch


research in computational molecular biology | 2018

Inferring Gene Regulatory Programs with Mixtures of Sparse Multi-Task Regression Models

Tobias Heinen; Azim Dehghani Amirabad; Florian Schmidt; Marcel H. Schulz

Collaboration


Dive into the Florian Schmidt's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jieyi Xiong

Max Delbrück Center for Molecular Medicine

View shared research outputs
Researchain Logo
Decentralizing Knowledge