Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Miron B. Kursa is active.

Publication


Featured researches published by Miron B. Kursa.


Nature Biotechnology | 2013

Evaluation of methods for modeling transcription factor sequence specificity

Matthew T. Weirauch; Raquel Norel; Matti Annala; Yue Zhao; Todd Riley; Julio Saez-Rodriguez; Thomas Cokelaer; Anastasia Vedenko; Shaheynoor Talukder; Phaedra Agius; Aaron Arvey; Philipp Bucher; Curtis G. Callan; Cheng Wei Chang; Chien-Yu Chen; Yong-Syuan Chen; Yu-Wei Chu; Jan Grau; Ivo Grosse; Vidhya Jagannathan; Jens Keilwagen; Szymon M. Kiełbasa; Justin B. Kinney; Holger Klein; Miron B. Kursa; Harri Lähdesmäki; Kirsti Laurila; Chengwei Lei; Christina S. Leslie; Chaim Linhart

Genomic analyses often involve scanning for potential transcription factor (TF) binding sites using models of the sequence specificity of DNA binding proteins. Many approaches have been developed to model and learn a proteins DNA-binding specificity, but these methods have not been systematically compared. Here we applied 26 such approaches to in vitro protein binding microarray data for 66 mouse TFs belonging to various families. For nine TFs, we also scored the resulting motif models on in vivo data, and found that the best in vitro–derived motifs performed similarly to motifs derived from the in vivo data. Our results indicate that simple models based on mononucleotide position weight matrices trained by the best methods perform similarly to more complex models for most TFs examined, but fall short in specific cases (<10% of the TFs examined here). In addition, the best-performing motifs typically have relatively low information content, consistent with widespread degeneracy in eukaryotic TF sequence preferences.


Fundamenta Informaticae | 2010

Boruta - A System for Feature Selection

Miron B. Kursa; Aleksander Jankowski; Witold R. Rudnicki

Machine learning methods are often used to classify objects described by hundreds of attributes; in many applications of this kind a great fraction of attributes may be totally irrelevant to the classification problem. Even more, usually one cannot decide a priori which attributes are relevant. In this paper we present an improved version of the algorithm for identification of the full set of truly important variables in an information system. It is an extension of the random forest method which utilises the importance measure generated by the original algorithm. It compares, in the iterative fashion, the importances of original attributes with importances of their randomised copies. We analyse performance of the algorithm on several examples of synthetic data, as well as on a biologically important problem, namely on identification of the sequence motifs that are important for aptameric activity of short RNA sequences.


BMC Bioinformatics | 2014

Robustness of Random Forest-based gene selection methods

Miron B. Kursa

BackgroundGene selection is an important part of microarray data analysis because it provides information that can lead to a better mechanistic understanding of an investigated phenomenon. At the same time, gene selection is very difficult because of the noisy nature of microarray data. As a consequence, gene selection is often performed with machine learning methods. The Random Forest method is particularly well suited for this purpose. In this work, four state-of-the-art Random Forest-based feature selection methods were compared in a gene selection context. The analysis focused on the stability of selection because, although it is necessary for determining the significance of results, it is often ignored in similar studies.ResultsThe comparison of post-selection accuracy of a validation of Random Forest classifiers revealed that all investigated methods were equivalent in this context. However, the methods substantially differed with respect to the number of selected genes and the stability of selection. Of the analysed methods, the Boruta algorithm predicted the most genes as potentially important.ConclusionsThe post-selection classifier error rate, which is a frequently used measure, was found to be a potentially deceptive measure of gene selection quality. When the number of consistently selected genes was considered, the Boruta algorithm was clearly the best. Although it was also the most computationally intensive method, the Boruta algorithm’s computational demands could be reduced to levels comparable to those of other algorithms by replacing the Random Forest importance with a comparable measure from Random Ferns (a similar but simplified classifier). Despite their design assumptions, the minimal optimal selection methods, were found to select a high fraction of false positives.


Physical Review B | 2011

Cascade of vortex loops initiated by a single reconnection of quantum vortices

Miron B. Kursa; Konrad Bajer; Tomasz Lipniacki

We demonstrate that a single reconnection of two quantum vortices can lead to the creation of a cascade of vortex rings. Our analysis involves localized induction approximation, high-resolution Biot-Savart and Gross-Pitaevskii simulations. The latter showed that the rings cascade starts on the atomic scale, with rings diameters orders of magnitude smaller than the characteristic line spacing in the tangle. Vortex rings created in the cascades may penetrate the tangle and annihilate on the boundaries. This provides an efficient decay mechanism for sparse or moderately dense vortex tangle at very low temperatures.


Cytometry Part A | 2016

A benchmark for evaluation of algorithms for identification of cellular correlates of clinical outcomes.

Nima Aghaeepour; Pratip K. Chattopadhyay; Maria Chikina; Tom Dhaene; Sofie Van Gassen; Miron B. Kursa; Bart N. Lambrecht; Mehrnoush Malek; Geoffrey J. McLachlan; Yu Qian; Peng Qiu; Yvan Saeys; Rick Stanton; Dong Tong; Celine Vens; Slawomir Walkowiak; Kui Wang; Greg Finak; Raphael Gottardo; Tim R. Mosmann; Garry P. Nolan; Richard H. Scheuermann; Ryan R. Brinkman

The Flow Cytometry: Critical Assessment of Population Identification Methods (FlowCAP) challenges were established to compare the performance of computational methods for identifying cell populations in multidimensional flow cytometry data. Here we report the results of FlowCAP‐IV where algorithms from seven different research groups predicted the time to progression to AIDS among a cohort of 384 HIV+ subjects, using antigen‐stimulated peripheral blood mononuclear cell (PBMC) samples analyzed with a 14‐color staining panel. Two approaches (FlowReMi.1 and flowDensity‐flowType‐RchyOptimyx) provided statistically significant predictive value in the blinded test set. Manual validation of submitted results indicated that unbiased analysis of single cell phenotypes could reveal unexpected cell types that correlated with outcomes of interest in high dimensional flow cytometry datasets.


international syposium on methodologies for intelligent systems | 2009

Musical Instruments in Random Forest

Miron B. Kursa; Witold R. Rudnicki; Alicja Wieczorkowska; Elżbieta Kubera; Agnieszka Kubik-Komar

This paper describes automatic classification of predominant musical instrument in sound mixes, using random forests as classifiers. The description of sound parameterization applied and methodology of random forest classification are given in the paper. Additionally, the significance of sound parameters used as conditional attributes is investigated. The results show that almost all sound attributes are informative, and random forest technique yields much higher classification results than support vector machines, used in previous research on these data.


RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing | 2010

Random musical bands playing in random forests

Miron B. Kursa; Elżbieta Kubera; Witold R. Rudnicki; Alicja Wieczorkowska

In this paper we investigate the problem of recognizing the full set of instruments playing in a sound mix. Random mixes of 2-5 instruments (out of 14) were created and parameterized to obtain experimental data. Sound samples were taken from 3 audio data sets. For classification purposes, we used a battery of one-instrument sensitive random forest classifiers, and obtained quite good results.


international syposium on methodologies for intelligent systems | 2011

All that jazz in the random forest

Elżbieta Kubera; Miron B. Kursa; Witold R. Rudnicki; Radosław Rudnicki; Alicja Wieczorkowska

In this paper, we address the problem of automatic identification of instruments in audio records, in a frame-by-frame manner. Random forests have been chosen as a classifier. Training data represent sounds of selected instruments which originate from three commonly used repositories, namely McGill University Master Samples, The University of IOWA Musical Instrument Samples, and RWC, as well as from recordings by one of the authors. Testing data represent audio records especially prepared for research purposes, and then carefully labeled (annotated). The experiments on identification of instruments on frame-by-frame basis and the obtained results are presented and discussed in the paper.


ICMMI | 2011

A Deceiving Charm of Feature Selection: The Microarray Case Study

Miron B. Kursa; Witold R. Rudnicki

Microarray analysis has become a significant use of machine learning in molecular biology. Datasets obtained from this method consist of tens of thousands of attributes usually describing tens of objects. Such setting makes the use of some form of feature selection an inevitable step of analysis—mostly to reduce the feature set to manageable size, but also to obtain an biological insight in the mechanisms of the investigated process. In this paper we present a reanalysis of a previously published late radiation toxicity prediction problem. On that lurid example we show how futile it may be to rely on non-validated feature selection and how even advanced algorithms fail to distinguish between noise and signal when the latter is weak. We also propose methods of detecting and dealing with mentioned problems.


Genome Research | 2013

Inferring gene expression from ribosomal promoter sequences, a crowdsourcing approach

Pablo Meyer; Geoffrey H. Siwo; Danny Zeevi; Eilon Sharon; Raquel Norel; Eran Segal; Gustavo Stolovitzky; Andrew K. Rider; Asako Tan; Richard S. Pinapati; Scott J. Emrich; Nitesh V. Chawla; Michael T. Ferdig; Yi-An Tung; Yong-Syuan Chen; Mei-Ju May Chen; Chien-Yu Chen; Jason M. Knight; Sayed Mohammad Ebrahim Sahraeian; Mohammad Shahrokh Esfahani; René Dreos; Philipp Bucher; Ezekiel Maier; Yvan Saeys; Ewa Szczurek; Alena Myšičková; Martin Vingron; Holger Klein; Szymon M. Kiełbasa; Jeff Knisley

The Gene Promoter Expression Prediction challenge consisted of predicting gene expression from promoter sequences in a previously unknown experimentally generated data set. The challenge was presented to the community in the framework of the sixth Dialogue for Reverse Engineering Assessments and Methods (DREAM6), a community effort to evaluate the status of systems biology modeling methodologies. Nucleotide-specific promoter activity was obtained by measuring fluorescence from promoter sequences fused upstream of a gene for yellow fluorescence protein and inserted in the same genomic site of yeast Saccharomyces cerevisiae. Twenty-one teams submitted results predicting the expression levels of 53 different promoters from yeast ribosomal protein genes. Analysis of participant predictions shows that accurate values for low-expressed and mutated promoters were difficult to obtain, although in the latter case, only when the mutation induced a large change in promoter activity compared to the wild-type sequence. As in previous DREAM challenges, we found that aggregation of participant predictions provided robust results, but did not fare better than the three best algorithms. Finally, this study not only provides a benchmark for the assessment of methods predicting activity of a specific set of promoters from their sequence, but it also shows that the top performing algorithm, which used machine-learning approaches, can be improved by the addition of biological features such as transcription factor binding sites.

Collaboration


Dive into the Miron B. Kursa's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alicja Wieczorkowska

University of North Carolina at Charlotte

View shared research outputs
Top Co-Authors

Avatar

Elżbieta Kubera

University of Life Sciences in Lublin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Adam Hamed

Nencki Institute of Experimental Biology

View shared research outputs
Top Co-Authors

Avatar

Agnieszka Kubik-Komar

University of Life Sciences in Lublin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tomasz Lipniacki

Polish Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge