Michał Dramiński
Polish Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michał Dramiński.
Bioinformatics | 2008
Michał Dramiński; Alvaro Rada-Iglesias; Stefan Enroth; Claes Wadelius; Jacek Koronacki; Jan Komorowski
MOTIVATION Pre-selection of informative features for supervised classification is a crucial, albeit delicate, task. It is desirable that feature selection provides the features that contribute most to the classification task per se and which should therefore be used by any classifier later used to produce classification rules. In this article, a conceptually simple but computer-intensive approach to this task is proposed. The reliability of the approach rests on multiple construction of a tree classifier for many training sets randomly chosen from the original sample set, where samples in each training set consist of only a fraction of all of the observed features. RESULTS The resulting ranking of features may then be used to advantage for classification via a classifier of any type. The approach was validated using Golub et al. leukemia data and the Alizadeh et al. lymphoma data. Not surprisingly, we obtained a significantly different list of genes. Biological interpretation of the genes selected by our method showed that several of them are involved in precursors to different types of leukemia and lymphoma rather than being genes that are common to several forms of cancers, which is the case for the other methods. AVAILABILITY Prototype available upon request.
Bioinformatics and Biology Insights | 2009
Marcin Kierczak; Krzysztof Ginalski; Michał Dramiński; Jacek Koronacki; Witold R. Rudnicki; Jan Komorowski
Reverse transcriptase (RT) is a viral enzyme crucial for HIV-1 replication. Currently, 12 drugs are targeted against the RT. The low fidelity of the RT-mediated transcription leads to the quick accumulation of drug-resistance mutations. The sequence-resistance relationship remains only partially understood. Using publicly available data collected from over 15 years of HIV proteome research, we have created a general and predictive rule-based model of HIV-1 resistance to eight RT inhibitors. Our rough set-based model considers changes in the physicochemical properties of a mutated sequence as compared to the wild-type strain. Thanks to the application of the Monte Carlo feature selection method, the model takes into account only the properties that significantly contribute to the resistance phenomenon. The obtained results show that drug-resistance is determined in more complex way than believed. We confirmed the importance of many resistance-associated sites, found some sites to be less relevant than formerly postulated and—more importantly—identified several previously neglected sites as potentially relevant. By mapping some of the newly discovered sites on the 3D structure of the RT, we were able to suggest possible molecular-mechanisms of drug-resistance. Importantly, our model has the ability to generalize predictions to the previously unseen cases. The study is an example of how computational biology methods can increase our understanding of the HIV-1 resistome.
Advances in Machine Learning II | 2010
Michał Dramiński; Marcin Kierczak; Jacek Koronacki; Jan Komorowski
Applications of machine learning techniques in Life Sciences are the main applications forcing a paradigm shift in the way these techniques are used. Rather than obtaining the best possible supervised classifier, the Life Scientist needs to know which features contribute best to classifying observations into distinct classes and what are the interdependencies between the features. To this end we significantly extend our earlier work [Draminski et al. (2008)] that introduced an effective and reliable method for ranking features according to their importance for classification. We begin with adding a method for finding a cut-off between informative and non-informative features and then continue with a development of a methodology and an implementation of a procedure for determining interdependencies between informative features. The reliability of our approach rests on multiple construction of tree classifiers. Essentially, each classifier is trained on a randomly chosen subset of the original data using only a fraction of all of the observed features. This approach is conceptually simple yet computer-intensive. The methodology is validated on a large and difficult task of modelling HIV-1 reverse transcriptase resistance to drugs which is a good example of the aforementioned paradigm shift. In this task, of the main interest is the identification of mutation points (i.e. features) and their combinations that model drug resistance.
Fundamenta Informaticae | 2013
Marcin Kruczyk; Nicholas Baltzer; Jakub Mieczkowski; Michał Dramiński; Jacek Koronacki; Jan Komorowski
An important step prior to constructing a classifier for a very large data set is feature selection. With many problems it is possible to find a subset of attributes that have the same discriminative power as the full data set. There are many feature selection methods but in none of them are Rough Set models tied up with statistical argumentation. Moreover, known methods of feature selection usually discard shadowed features, i.e. those carrying the same or partially the same information as the selected features. In this study we present Random Reducts RR-a feature selection method which precedes classification per se. The method is based on the Monte Carlo Feature Selection MCFS layout and uses Rough Set Theory in the feature selection process. On synthetic data, we demonstrate that the method is able to select otherwise shadowed features of which the user should be made aware, and to find interactions in the data set.
Bioinformatics and Biology Insights | 2010
Marcin Kierczak; Michał Dramiński; Jacek Koronacki; Jan Komorowski
Motivation Despite more than two decades of research, HIV resistance to drugs remains a serious obstacle in developing efficient AIDS treatments. Several computational methods have been developed to predict resistance level from the sequence of viral proteins such as reverse transcriptase (RT) or protease. These methods, while powerful and accurate, give very little insight into the molecular interactions that underly acquisition of drug resistance/hypersusceptibility. Here, we attempt at filling this gap by using our Monte Carlo feature selection and interdependency discovery method (MCFS-ID) to elucidate molecular interaction networks that characterize viral strains with altered drug resistance levels. Results We analyzed a number of HIV-1 RT sequences annotated with drug resistance level using the MCFS-ID method. This let us expound interdependency networks that characterize change of drug resistance to six selected RT inhibitors: Abacavir, Lamivudine, Stavudine, Zidovudine, Tenofovir and Nevirapine. The networks consider interdependencies at the level of physicochemical properties of mutating amino acids, eg,: polarity. We mapped each network on the 3D structure of RT in attempt to understand the molecular meaning of interacting pairs. The discovered interactions describe several known drug resistance mechanisms and, importantly, some previously unidentified ones. Our approach can be easily applied to a whole range of problems from the domain of protein engineering. Availability A portable Java implementation of our MCFS-ID method is freely available for academic users and can be obtained at: http://www.ipipan.eu/staff/m.draminski/software.htm.
intelligent information systems | 2005
Krzysztof Ciesielski; Michał Dramiński; Mieczyslaw A. Klopotek; Mariusz Kujawiak; Slawomir T. Wierzchon
In this research paper we pinpoint at the need of redesigning of the WebSOM document map creation algorithm. We insist that the SOM clustering should be preceded by identifying major topics of the document collection. Furthermore, the SOM clustering should be preceded by a pre-clustering process resulting in creation of groups of documents with stronger relationships; the groups, not the documents, should be subject of SOM clustering. We propose appropriate algorithms and report on achieved improvements.
Archive | 2016
Michał Dramiński; Michał J. Da̧browski; Klev Diamanti; Jacek Koronacki; Jan Komorowski
The availability of very large data sets in Life Sciences provided earlier by the technological breakthroughs such as microarrays and more recently by various forms of sequencing has created both challenges in analyzing these data as well as new opportunities. A promising, yet underdeveloped approach to Big Data, not limited to Life Sciences, is the use of feature selection and classification to discover interdependent features. Traditionally, classifiers have been developed for the best quality of supervised classification. In our experience, more often than not, rather than obtaining the best possible supervised classifier, the Life Scientist needs to know which features contribute best to classifying observations (objects, samples) into distinct classes and what the interdependencies between the features that describe the observation. Our underlying hypothesis is that the interdependent features and rule networks do not only reflect some syntactical properties of the data and classifiers but also may convey meaningful clues about true interactions in the modeled biological system. In this chapter we develop further our method of Monte Carlo Feature Selection and Interdependency Discovery (MCFS and MCFS-ID, respectively), which are particularly well suited for high-dimensional problems, i.e., those where each observation is described by very many features, often many more features than the number of observations. Such problems are abundant in Life Science applications. Specifically, we define Inter-Dependency Graphs (termed, somewhat confusingly, ID Graphs) that are directed graphs of interactions between features extracted by aggregation of information from the classification trees constructed by the MCFS algorithm. We then proceed with modeling interactions on a finer level with rule networks. We discuss some of the properties of the ID graphs and make a first attempt at validating our hypothesis on a large gene expression data set for CD4\(^{+}\) T-cells. The MCFS-ID and ROSETTA including the Ciruvis approach offer a new methodology for analyzing Big Data from feature selection, through identification of feature interdependencies, to classification with rules according to decision classes, to construction of rule networks. Our preliminary results confirm that MCFS-ID is applicable to the identification of interacting features that are functionally relevant while rule networks offer a complementary picture with finer resolution of the interdependencies on the level of feature-value pairs.
intelligent information systems | 2006
Krzysztof Ciesielski; Michał Dramiński; Mieczyslaw A. Klopotek; Dariusz Czerski; Slawomir T. Wierzchon
As document map creation algorithms like WebSOM are computation- ally expensive, and hardly reconstructible even from the same set of documents, new methodology is urgently needed to allow to construct document maps to han- dle streams of new documents entering document collection. This challenge is dealt with within this paper. In a multi-stage process, incrementality of a document map is warranted. 1 The quality of map generation process has been investigated based on a number of clustering and classification measures. Conclusions concerning the impact of incremental, topic-sensitive approach on map quality are drawn.
Challenging Problems and Solutions in Intelligent Systems | 2016
Dariusz Czerski; Krzysztof Ciesielski; Michał Dramiński; Mieczyslaw A. Klopotek; Paweł Łoziński; Slawomir T. Wierzchon
We introduce a new semantic search engine, developed at our institute. Its unique feature is the automatic construction of semantic resources, like discovery of millions of facts, IS-A relations and automated generation of sentimental analysis dictionaries. We developed a new method of document categorization. The engine can be queried in natural language and possesses interfaces to be used not only by humans but also by machines.
intelligent information systems | 2005
Michał Dramiński; Jacek Koronacki; Jan Komorowski
In the paper, three conceptually simple but computer-intensive versions of an approach to selecting informative genes for classification are proposed. All of them rely on multiple construction of a tree classifier for many training sets randomly chosen from the original sample set, where samples in each training set consist of only a fraction of all of the genes. It is argued that the resulting ranking of genes can then be used to advantage for classification via a classifier of any type.