Dragi Kocev | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dragi Kocev is active.

Explore More

Publication

Featured researches published by Dragi Kocev.

Pattern Recognition | 2012

An extensive experimental comparison of methods for multi-label learning

Gjorgji Madjarov; Dragi Kocev; Dejan Gjorgjevikj; Sašo Deroski

Multi-label learning has received significant attention in the research community over the past few years: this has resulted in the development of a variety of multi-label learning methods. In this paper, we present an extensive experimental comparison of 12 multi-label learning methods using 16 evaluation measures over 11 benchmark datasets. We selected the competing methods based on their previous usage by the community, the representation of different groups of methods and the variety of basic underlying machine learning methods. Similarly, we selected the evaluation measures to be able to assess the behavior of the methods from a variety of view-points. In order to make conclusions independent from the application domain, we use 11 datasets from different domains. Furthermore, we compare the methods by their efficiency in terms of time needed to learn a classifier and time needed to produce a prediction for an unseen example. We analyze the results from the experiments using Friedman and Nemenyi tests for assessing the statistical significance of differences in performance. The results of the analysis show that for multi-label classification the best performing methods overall are random forests of predictive clustering trees (RF-PCT) and hierarchy of multi-label classifiers (HOMER), followed by binary relevance (BR) and classifier chains (CC). Furthermore, RF-PCT exhibited the best performance according to all measures for multi-label ranking. The recommendation from this study is that when new methods for multi-label learning are proposed, they should be compared to RF-PCT and HOMER using multiple evaluation measures.

BMC Bioinformatics | 2010

Predicting gene function using hierarchical multi-label decision tree ensembles

Leander Schietgat; Celine Vens; Jan Struyf; Hendrik Blockeel; Dragi Kocev; Sašo Džeroski

BackgroundS. cerevisiae, A. thaliana and M. musculus are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability.ResultsWe study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO). We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use.ConclusionsOur results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction.

Pattern Recognition | 2013

Tree ensembles for predicting structured outputs

Dragi Kocev; Celine Vens; Jan Struyf; Sašo Deroski

In this paper, we address the task of learning models for predicting structured outputs. We consider both global and local predictions of structured outputs, the former based on a single model that predicts the entire output structure and the latter based on a collection of models, each predicting a component of the output structure. We use ensemble methods and apply them in the context of predicting structured outputs. We propose to build ensemble models consisting of predictive clustering trees, which generalize classification trees: these have been used for predicting different types of structured outputs, both locally and globally. More specifically, we develop methods for learning two types of ensembles (bagging and random forests) of predictive clustering trees for global and local predictions of different types of structured outputs. The types of outputs considered correspond to different predictive modeling tasks: multi-target regression, multi-target classification, and hierarchical multi-label classification. Each of the combinations can be applied both in the context of global prediction (producing a single ensemble) or local prediction (producing a collection of ensembles). We conduct an extensive experimental evaluation across a range of benchmark datasets for each of the three types of structured outputs. We compare ensembles for global and local prediction, as well as single trees for global prediction and tree collections for local prediction, both in terms of predictive performance and in terms of efficiency (running times and model complexity). The results show that both global and local tree ensembles perform better than the single model counterparts in terms of predictive power. Global and local tree ensembles perform equally well, with global ensembles being more efficient and producing smaller models, as well as needing fewer trees in the ensemble to achieve the maximal performance.

european conference on machine learning | 2007

Ensembles of Multi-Objective Decision Trees

Dragi Kocev; Celine Vens; Jan Struyf; Sašo Džeroski

Ensemble methods are able to improve the predictive performance of many base classifiers. Up till now, they have been applied to classifiers that predict a single target attribute. Given the non-trivial interactions that may occur among the different targets in multi-objective prediction tasks, it is unclear whether ensemble methods also improve the performance in this setting. In this paper, we consider two ensemble learning techniques, bagging and random forests, and apply them to multi-objective decision trees (MODTs), which are decision trees that predict multiple target attributes at once. We empirically investigate the performance of ensembles of MODTs. Our most important conclusions are: (1) ensembles of MODTs yield better predictive performance than MODTs, and (2) ensembles of MODTs are equally good, or better than ensembles of single-objective decision trees, i.e., a set of ensembles for each target. Moreover, ensembles of MODTs have smaller model size and are faster to learn than ensembles of single-objective decision trees.

Pattern Recognition | 2011

Hierarchical annotation of medical images

Ivica Dimitrovski; Dragi Kocev; Suzana Loskovska; Sašo Deroski

We present a hierarchical multi-label classification (HMC) system for medical image annotation. HMC is a variant of classification where an instance may belong to multiple classes at the same time and these classes/labels are organized in a hierarchy. Our approach to HMC exploits the annotation hierarchy by building a single predictive clustering tree (PCT) that can simultaneously predict all annotations of an image. Hence, PCTs are very efficient: a single classifier is valid for the hierarchical semantics as a whole, as compared to other approaches that produce many classifiers, each valid just for one given class. To improve performance, we construct ensembles of PCTs. We evaluate our system on the IRMA database that consists of X-ray images. We investigate its performance under a variety of conditions. To begin with, we consider two ensemble approaches, bagging and random forests. Next, we use several state-of-the-art feature extraction approaches and combinations thereof. Finally, we employ two types of feature fusion, i.e., low and high level fusion. The experiments show that our system outperforms the best-performing approach from the literature (a collection of SVMs, each predicting one label at the lowest level of the hierarchy), both in terms of error and efficiency. This holds across a range of descriptors and descriptor combinations, regardless of the type of feature fusion used. To stress the generality of the proposed approach, we have also applied it for automatic annotation of a large number of consumer photos with multiple annotations organized in semantic hierarchy. The obtained results show that this approach is general and easily applicable in different domains, offering state-of-the-art performance.

Ecological Informatics | 2012

Hierarchical classification of diatom images using ensembles of predictive clustering trees

Ivica Dimitrovski; Dragi Kocev; Suzana Loskovska; Sašo Džeroski

Abstract This paper presents a hierarchical multi-label classification (HMC) system for diatom image classification. HMC is a variant of classification where an instance may belong to multiple classes at the same time and these classes/labels are organized in a hierarchy. Our approach to HMC exploits the classification hierarchy by building a single predictive clustering tree (PCT) that can simultaneously predict all different levels in the hierarchy of taxonomic ranks: genus, species, variety, and form. Hence, PCTs are very efficient: a single classifier is valid for the hierarchical classification scheme as a whole. To improve the predictive performance of the PCTs, we construct ensembles of PCTs. We evaluate our system on the ADIAC database of diatom images. We apply several feature extraction techniques that can be used in the context of diatom images. Moreover, we investigate whether the combination of these techniques increases predictive performance. The results show that ensembles of PCTs have better predictive performance and are more efficient than SVMs. Furthermore, the proposed system outperforms the most widely used approaches for image annotation. Finally, we demonstrate how the system can be used by taxonomists to annotate new diatom images.

Computerized Medical Imaging and Graphics | 2015

Improved medical image modality classification using a combination of visual and textual features

Ivica Dimitrovski; Dragi Kocev; Ivan Kitanovski; Suzana Loskovska; Sašo Džeroski

In this paper, we present the approach that we applied to the medical modality classification tasks at the ImageCLEF evaluation forum. More specifically, we used the modality classification databases from the ImageCLEF competitions in 2011, 2012 and 2013, described by four visual and one textual types of features, and combinations thereof. We used local binary patterns, color and edge directivity descriptors, fuzzy color and texture histogram and scale-invariant feature transform (and its variant opponentSIFT) as visual features and the standard bag-of-words textual representation coupled with TF-IDF weighting. The results from the extensive experimental evaluation identify the SIFT and opponentSIFT features as the best performing features for modality classification. Next, the low-level fusion of the visual features improves the predictive performance of the classifiers. This is because the different features are able to capture different aspects of an image, their combination offering a more complete representation of the visual content in an image. Moreover, adding textual features further increases the predictive performance. Finally, the results obtained with our approach are the best results reported on these databases so far.

Information Sciences | 2016

Improving bag-of-visual-words image retrieval with predictive clustering trees

Ivica Dimitrovski; Dragi Kocev; Suzana Loskovska; Sašo Džeroski

The recent overwhelming increase in the amount of available visual information, especially digital images, has brought up a pressing need to develop efficient and accurate systems for image retrieval. State-of-the-art systems for image retrieval use the bag-of-visual-words representation of images. However, the computational bottleneck in all such systems is the construction of the visual codebook, i.e., obtaining the visual words. This is typically performed by clustering hundreds of thousands or millions of local descriptors, where the resulting clusters correspond to visual words. Each image is then represented by a histogram of the distribution of its local descriptors across the codebook. The major issue in retrieval systems is that by increasing the sizes of the image databases, the number of local descriptors to be clustered increases rapidly: Thus, using conventional clustering techniques is infeasible. Considering this, we propose to construct the visual codebook by using predictive clustering trees (PCTs), which can be constructed and executed efficiently and have good predictive performance. Moreover, to increase the stability of the model, we propose to use random forests of predictive clustering trees. We create a random forest of PCTs that represents both the codebook and the indexing structure. We evaluate the proposed improvement of the bag-of-visual-words approach on three reference datasets and two additional datasets of 100K images and 1M images, compare it to two state-of-the-art methods based on approximate k-means and extremely randomized tree ensembles. The results reveal that the proposed method produces a visual codebook with superior discriminative power and thus better retrieval performance while maintaining excellent computational efficiency.

Frontiers in Microbiology | 2014

Chaophilic or chaotolerant fungi: a new category of extremophiles?

Janja Zajc; Sašo Džeroski; Dragi Kocev; Aharon Oren; Silva Sonjak; Rok Tkavc; Nina Gunde-Cimerman

It is well known that few halophilic bacteria and archaea as well as certain fungi can grow at the highest concentrations of NaCl. However, data about possible life at extremely high concentrations of various others kosmotropic (stabilizing; like NaCl, KCl, and MgSO4) and chaotropic (destabilizing) salts (NaBr, MgCl2, and CaCl2) are scarce for prokaryotes and almost absent for the eukaryotic domain including fungi. Fungi from diverse (extreme) environments were tested for their ability to grow at the highest concentrations of kosmotropic and chaotropic salts ever recorded to support life. The majority of fungi showed preference for relatively high concentrations of kosmotropes. However, our study revealed the outstanding tolerance of several fungi to high concentrations of MgCl2 (up to 2.1 M) or CaCl2 (up to 2.0 M) without compensating kosmotropic salts. Few species, for instance Hortaea werneckii, Eurotium amstelodami, Eurotium chevalieri and Wallemia ichthyophaga, are able to thrive in media with the highest salinities of all salts (except for CaCl2 in the case of W. ichthyophaga). The upper concentration of MgCl2 to support fungal life in the absence of kosmotropes (2.1 M) is much higher than previously determined to be the upper limit for microbial growth (1.26 M). No fungal representatives showed exclusive preference for only chaotropic salts (being obligate chaophiles). Nevertheless, our study expands the knowledge of possible active life by a diverse set of fungi in biologically detrimental chaotropic environments.

Fungal Diversity | 2016

Halophily reloaded: new insights into the extremophilic life-style of Wallemia with the description of Wallemia hederae sp. nov

Sašo Jančič; Dragi Kocev; Hans-Josef Schroers; Sašo Džeroski; Nina Gunde-Cimerman

Wallemia comprises air- and food-borne, mycotoxigenic contaminants including the halophilic W. ichthyophaga, xerotolerant W. sebi and xerophilic W. muriae. Wallemia isolates are easily overlooked and only a comparably small number of strains have been deposited in culture collections so far. In order to better understand the natural distribution of Wallemia spp. and to encounter their natural habitats, we tested more than 300 low-water-activity substrates and 30 air samples from a wide geographical coverage. We isolated more than 150 new Wallemia strains. Wallemia sebi and W. muriae were isolated mostly from hypersaline water, low-water-activity foods, plant materials and indoor. Wallemia muriae is the dominant Wallemia species in the air of natural and human influenced environments in Europe. New isolates of W. ichthyophaga were obtained from hypersaline environments such as brine, salt crystals, salty foods and MgCl2-rich bitterns, and from the air of hay barns in Denmark. Five halotolerant strains were recognised as a hitherto un-described species Wallemia hederae, the phylogenetic sister of the halophilic W. ichthyophaga. Wallemia spp. show in-vitro growth on media that contain the chaotropic salt MgCl2. Wallemia ichthyophaga can grow in liquid medium enriched with 2xa0M MgCl2. Never before has a microorganism been grown on comparably high MgCl2 concentrations. Tests of the activity of a wide range of extracellular enzymes in the presence of NaCl also suggested that Wallemia is well-adapted to substrates with a reduced water activity.

Explore More