Norbert Dojer
University of Warsaw
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Norbert Dojer.
BMC Bioinformatics | 2006
Norbert Dojer; Anna Gambin; Andrzej Mizera; Bartek Wilczynski; Jerzy Tiuryn
BackgroundA central goal of molecular biology is to understand the regulatory mechanisms of gene transcription and protein synthesis. Because of their solid basis in statistics, allowing to deal with the stochastic aspects of gene expressions and noisy measurements in a natural way, Bayesian networks appear attractive in the field of inferring gene interactions structure from microarray experiments data. However, the basic formalism has some disadvantages, e.g. it is sometimes hard to distinguish between the origin and the target of an interaction. Two kinds of microarray experiments yield data particularly rich in information regarding the direction of interactions: time series and perturbation experiments. In order to correctly handle them, the basic formalism must be modified. For example, dynamic Bayesian networks (DBN) apply to time series microarray data. To our knowledge the DBN technique has not been applied in the context of perturbation experiments.ResultsWe extend the framework of dynamic Bayesian networks in order to incorporate perturbations. Moreover, an exact algorithm for inferring an optimal network is proposed and a discretization method specialized for time series data from perturbation experiments is introduced. We apply our procedure to realistic simulations data. The results are compared with those obtained by standard DBN learning techniques. Moreover, the advantages of using exact learning algorithm instead of heuristic methods are analyzed.ConclusionWe show that the quality of inferred networks dramatically improves when using data from perturbation experiments. We also conclude that the exact algorithm should be used when it is possible, i.e. when considered set of genes is small enough.
Bioinformatics | 2009
Bartosz Wilczyński; Norbert Dojer
Motivation: Bayesian methods are widely used in many different areas of research. Recently, it has become a very popular tool for biological network reconstruction, due to its ability to handle noisy data. Even though there are many software packages allowing for Bayesian network reconstruction, only few of them are freely available to researchers. Moreover, they usually require at least basic programming abilities, which restricts their potential user base. Our goal was to provide software which would be freely available, efficient and usable to non-programmers. Results: We present a BNFinder software, which allows for Bayesian network reconstruction from experimental data. It supports dynamic Bayesian networks and, if the variables are partially ordered, also static Bayesian networks. The main advantage of BNFinder is the use exact algorithm, which is at the same time very efficient (polynomial with respect to the number of observations). Availability: The software, supplementary information and manual is available at http://bioputer.mimuw.edu.pl/software/bnf/. Besides the availability of the standalone application and the source code, we have developed a web interface to BNFinder application running on our servers. A web tutorial on different options of BNFinder is also available. Contact: [email protected]
mathematical foundations of computer science | 2006
Norbert Dojer
We propose an algorithm for learning an optimal Bayesian network from data. Our method is addressed to biological applications, where usually datasets are small but sets of random variables are large. Moreover we assume that there is no need to examine the acyclicity of the graph. We provide polynomial bounds (with respect to the number of random variables) for time complexity of our algorithm for two generally used scoring criteria: Minimal Description Length and Bayesian-Dirichlet equivalence.
BMC Bioinformatics | 2009
Bartosz Wilczyński; Norbert Dojer; Mateusz Patelak; Jerzy Tiuryn
BackgroundFinding functional regulatory elements in DNA sequences is a very important problem in computational biology and providing a reliable algorithm for this task would be a major step towards understanding regulatory mechanisms on genome-wide scale. Major obstacles in this respect are that the fact that the amount of non-coding DNA is vast, and that the methods for predicting functional transcription factor binding sites tend to produce results with a high percentage of false positives. This makes the problem of finding regions significantly enriched in binding sites difficult.ResultsWe develop a novel method for predicting regulatory regions in DNA sequences, which is designed to exploit the evolutionary conservation of regulatory elements between species without assuming that the order of motifs is preserved across species. We have implemented our method and tested its predictive abilities on various datasets from different organisms.ConclusionWe show that our approach enables us to find a majority of the known CRMs using only sequence information from different species together with currently publicly available motif data. Also, our method is robust enough to perform well in predicting CRMs, despite differences in tissue specificity and even across species, provided that the evolutionary distances between compared species do not change substantially. The complexity of the proposed algorithm is polynomial, and the observed running times show that it may be readily applied.
Bioinformatics | 2013
Norbert Dojer; Paweł Bednarz; Agnieszka Podsiadło; Bartek Wilczynski
Summary: Bayesian Networks (BNs) are versatile probabilistic models applicable to many different biological phenomena. In biological applications the structure of the network is usually unknown and needs to be inferred from experimental data. BNFinder is a fast software implementation of an exact algorithm for finding the optimal structure of the network given a number of experimental observations. Its second version, presented in this article, represents a major improvement over the previous version. The improvements include (i) a parallelized learning algorithm leading to an order of magnitude speed-ups in BN structure learning time; (ii) inclusion of an additional scoring function based on mutual information criteria; (iii) possibility of choosing the resulting network specificity based on statistical criteria and (iv) a new module for classification by BNs, including cross-validation scheme and classifier quality measurements with receiver operator characteristic scores. Availability and implementation: BNFinder2 is implemented in python and freely available under the GNU general public license at the project Web site https://launchpad.net/bnfinder, together with a user’s manual, introductory tutorial and supplementary methods. Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
Algorithms for Molecular Biology | 2014
Michał Modzelewski; Norbert Dojer
BackgroundProgressive methods offer efficient and reasonably good solutions to the multiple sequence alignment problem. However, resulting alignments are biased by guide-trees, especially for relatively distant sequences.ResultsWe propose MSARC, a new graph-clustering based algorithm that aligns sequence sets without guide-trees. Experiments on the BAliBASE dataset show that MSARC achieves alignment quality similar to the best progressive methods.Furthermore, MSARC outperforms them on sequence sets whose evolutionary distances are difficult to represent by a phylogenetic tree. These datasets are most exposed to the guide-tree bias of alignments.AvailabilityMSARC is available at http://bioputer.mimuw.edu.pl/msarc
BMC Systems Biology | 2010
Michal Dabrowski; Norbert Dojer; Malgorzata Zawadzka; Jakub Mieczkowski; Bozena Kaminska
BackgroundIt is often desirable to separate effects of different regulators on gene expression, or to identify effects of the same regulator across several systems. Here, we focus on the rat brain following stroke or seizures, and demonstrate how the two tasks can be approached simultaneously.ResultsWe applied SVD to time-series gene expression datasets from the rat experimental models of stroke and seizures. We demonstrate conservation of two eigensystems, reflecting inflammation and/or apoptosis (eigensystem 2) and neuronal synaptic activity (eigensystem 3), between the stroke and seizures. We analyzed cis-regulation of gene expression in the subspaces of the conserved eigensystems. Bayesian networks analysis was performed separately for either experimental model, with cross-system validation of the highest-ranking features. In this way, we correctly re-discovered the role of AP1 in the regulation of apoptosis, and the involvement of Creb and Egr in the regulation of synaptic activity-related genes.We identified a novel antagonistic effect of the motif recognized by the nuclear matrix attachment region-binding protein Satb1 on AP1-driven transcriptional activation, suggesting a link between chromatin loop structure and gene activation by AP1. The effects of motifs binding Satb1 and Creb on gene expression in brain conform to the assumption of the linear response model of gene regulation. Our data also suggest that numerous enhancers of neuronal-specific genes are important for their responsiveness to the synaptic activity.ConclusionEigensystems conserved between stroke and seizures separate effects of inflammation/apoptosis and neuronal synaptic activity, exerted by different transcription factors, on gene expression in rat brain.
BMC Bioinformatics | 2015
Michal Dabrowski; Norbert Dojer; Izabella Krystkowiak; Bozena Kaminska; Bartek Wilczynski
BackgroundFor many years now, binding preferences of Transcription Factors have been described by so called motifs, usually mathematically defined by position weight matrices or similar models, for the purpose of predicting potential binding sites. However, despite the availability of thousands of motif models in public and commercial databases, a researcher who wants to use them is left with many competing methods of identifying potential binding sites in a genome of interest and there is little published information regarding the optimality of different choices. Thanks to the availability of large number of different motif models as well as a number of experimental datasets describing actual binding of TFs in hundreds of TF-ChIP-seq pairs, we set out to perform a comprehensive analysis of this matter.ResultsWe focus on the task of identifying potential transcription factor binding sites in the human genome. Firstly, we provide a comprehensive comparison of the coverage and quality of models available in different databases, showing that the public databases have comparable TFs coverage and better motif performance than commercial databases. Secondly, we compare different motif scanners showing that, regardless of the database used, the tools developed by the scientific community outperform the commercial tools. Thirdly, we calculate for each motif a detection threshold optimizing the accuracy of prediction. Finally, we provide an in-depth comparison of different methods of choosing thresholds for all motifs a priori. Surprisingly, we show that selecting a common false-positive rate gives results that are the least biased by the information content of the motif and therefore most uniformly accurate.ConclusionWe provide a guide for researchers working with transcription factor motifs. It is supplemented with detailed results of the analysis and the benchmark datasets at http://bioputer.mimuw.edu.pl/papers/motifs/.
International Journal of Approximate Reasoning | 2016
Norbert Dojer
The current paper addresses two problems observed in structure learning applications to computational biology.The first one is dealing with mixed data. Most optimization criteria for learning algorithms are applicable to either discrete or continuous data. Mixed datasets are usually handled by discretization of continuous data, which often leads to the loss of information. In order to address this problem, we adapted discrete scoring functions to continuous data. Consequently, the same score is used to both types of variables, and the network structure may be learned from mixed data directly.The second problem is the control of the type I error level. Usually, learning algorithms output a network that is the best according to some optimization criteria, but the reliability of particular relationships represented by this network is unknown. We address this problem by allowing the user to specify the expected error level and adjusting the parameters of the scoring criteria to this level. A method of the adaptation of discrete scoring functions to continuous variables is proposed.Our method may be applied to datasets joining continuous and discrete variables.A method of the control of the type I error level is proposed.
workshop on algorithms in bioinformatics | 2013
Michał Modzelewski; Norbert Dojer
Progressive methods offer efficient and reasonably good solutions to the multiple sequence alignment problem. However, resulting alignments are biased by guide-trees, especially for relatively distant sequences.