Saso Dzeroski | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Saso Dzeroski is active.

Explore More

Publication

Featured researches published by Saso Dzeroski.

The Data Mining and Knowledge Discovery Handbook | 2001

Relational Data Mining

Saso Dzeroski; Nada Lavrač

We may not be able to make you love reading, but relational data mining will lead you to love reading starting from now. Book is the window to open the new world. The world that you want is in the better stage and level. World will always guide you to even the prestige stage of the life. You know, this is some of how reading will give you the kindness. In this case, more books you read more knowledge you know, but it can mean also the bore is full.

EWSL'91 Proceedings of the 5th European Conference on European Working Session on Learning | 1991

Learning Nonrecursive Definitions of Relations with LINUS

Nada Lavrač; Saso Dzeroski; Marko Grobelnik

Many successful inductive learning systems use a propositional attribute-value language to represent both training examples and induced hypotheses. Recent developments are concerned with systems that induce concept descriptions in first-order logic. The deductive hierarchical database (DHDB) formalism is a restricted form of Horn clause logic in which nonrecursive logical definitions of relations can be expressed. Having variables, compound terms and predicates, the DHDB formalism allows for more compact descriptions of concepts than an attribute-value language. Our inductive learning system LINUS uses the DHDB formalism to represent concepts as definitions of relations. The paper gives a description of LINUS and presents the results of its successful application to several inductive learning tasks taken from the machine learning literature. A comparison with the results of other first-order learning systems is given as well.

european conference on principles of data mining and knowledge discovery | 2000

Combining Multiple Models with Meta Decision Trees

Ljupco Todorovski; Saso Dzeroski

The paper introduces meta decision trees (MDTs), a novel method for combining multiple models. Instead of giving a prediction, MDT leaves specify which model should be used to obtain a prediction. We present an algorithm for learning MDTs based on the C4.5 algorithm for learning ordinary decision trees (ODTs). An extensive experimental evaluation of the new algorithm is performed on twenty-one data sets, combining models generated by five learning algorithms: two algorithms for learning decision trees, a rule learning algorithm, a nearest neighbor algorithm and a naive Bayes algorithm. In terms of performance, MDTs combine models better than voting and stacking with ODTs. In addition, MDTs are much more concise than ODTs used for stacking and are thus a step towards comprehensible combination of multiple models.

Applied Artificial Intelligence | 2000

Noise detection and elimination in data preprocessing: Experiments in medical domains

Dragan Gamberger; Nada Lavrač; Saso Dzeroski

Compression measures used in inductive learners, such as measures based on the minimum description length principle, can be used as a basis for grading candidate hypotheses. Compression-based induction is suited also for handling noisy data. This paper shows that a simple compression measure can be used to detect noisy training examples, where noise is due to random classification errors. A technique is proposed in which noisy examples are detected and eliminated from the training set, and a hypothesis is then built from the set of remaining examples. This noise elimination method was applied to preprocess data for four machine-learning algorithms, and evaluated on selected medical domains.

european conference on principles of data mining and knowledge discovery | 1999

Simultaneous Prediction of Mulriple Chemical Parameters of River Water Quality with TILDE

Hendrik Blockeel; Saso Dzeroski; Jasna Grbovic

Environmental studies form an increasingly popular application domain for machine learning and data mining techniques. In this paper we consider two applications of decision tree learning in the domain of river water quality: a) the simultaneous prediction of multiple physico-chemical properties of the water from its biological properties using a single decision tree (as opposed to learning a different tree for each different property) and b) the prediction of past physico-chemical properties of the river water from its current biological properties. We discuss some experimental results that we believe are interesting both to the application domain experts and to the machine learning community.

international conference on data mining | 2008

OntoDM: An Ontology of Data Mining

Panče Panov; Saso Dzeroski; Larisa N. Soldatova

Motivated by the need for unification of the field of data mining and the growing demand for formalized representation of outcomes of research, we address the task of constructing an ontology of data mining. The proposed ontology, named OntoDM, is based on a recent proposal of a general framework for data mining, and includes definitions of basic data mining entities, such as datatype and dataset, data mining task, data mining algorithm and components thereof (e.g., distance function), etc. It also allows for the definition of more complex entities, e.g., constraints in constraint-based data mining, sets of such constraints (inductive queries) and data mining scenarios (sequences of inductive queries). Unlike most existing approaches to constructing ontologies of data mining, OntoDM is a deep/heavy-weight ontology and follows best practices in ontology engineering, such as not allowing multiple inheritance of classes, using a predefined set of relations and using a top level ontology.

algorithmic learning theory | 1996

Noise Elimination in Inductive Concept Learning: A Case Study in Medical Diagnosois

Dragan Gamberger; Nada Lavrač; Saso Dzeroski

Compression measures used in inductive learners, such as measures based on the MDL (Minimum Description Length) principle, provide a theoretically justified basis for grading candidate hypotheses. Compression-based induction is appropriate also for handling of noisy data. This paper shows that a simple compression measure can be used to detect noisy examples. A technique is proposed in which noisy examples are detected and eliminated from the training set, and a hypothesis is then built from the set of remaining examples. The separation of noise detection and hypothesis formation has the advantage that noisy examples do not influence hypothesis construction as opposed to most standard approaches to noise handling in which the learner typically tries to avoid overfitting the noisy example set. This noise elimination method is applied to a problem of early diagnosis of rheumatic diseases which is known to be a difficult problem, due both to its nature and to the imperfections in the dataset. The method is evaluated by applying the noise elimination algorithm in conjunction with the CN2 rule induction algorithm, and by comparing their performance to earlier results obtained by CN2 in this diagnostic domain.

PLOS ONE | 2013

Gut Microbiota Patterns Associated with Colonization of Different Clostridium difficile Ribotypes

Jure Škraban; Saso Dzeroski; Bernard Zenko; Domen Mongus; Simon Gangl; Maja Rupnik

C. difficile infection is associated with disturbed gut microbiota and changes in relative frequencies and abundance of individual bacterial taxons have been described. In this study we have analysed bacterial, fungal and archaeal microbiota by denaturing high pressure liquid chromatography (DHPLC) and with machine learning methods in 208 faecal samples from healthy volunteers and in routine samples with requested C. difficile testing. The latter were further divided according to stool consistency, C. difficile presence or absence and C. difficile ribotype (027 or non-027). Lower microbiota diversity was a common trait of all routine samples and not necessarily connected only to C. difficile colonisation. Differences between the healthy donors and C. difficile positive routine samples were detected in bacterial, fungal and archaeal components. Bifidobacterium longum was the single most important species associated with C. difficile negative samples. However, by machine learning approaches we have identified patterns of microbiota composition predictive for C. difficile colonization. Those patterns also differed between samples with C. difficile ribotype 027 and other C. difficile ribotypes. The results indicate that not only the presence of a single species/group is important but that certain combinations of gut microbes are associated with C. difficile carriage and that some ribotypes (027) might be associated with more disturbed microbiota than the others.

multiple classifier systems | 2002

Stacking with Multi-response Model Trees

Saso Dzeroski; Bernard Zenko

We empirically evaluate several state-of-the-art methods for constructing ensembles of classifiers with stacking and show that they perform (at best) comparably to selecting the best classifier from the ensemble by cross validation. We then propose a new method for stacking, that uses multi-response model trees at the meta-level, and show that it outperforms existing stacking approaches, as well as selecting the best classifier from the ensemble by cross validation.

Applied Artificial Intelligence | 1998

Diterpene structure elucidation from 13cnmr spectra with inductive logic programming

Saso Dzeroski; Steffen Schulze-Kremer; Karsten R. Heidtke; Karsten Siems; Dietrich Wettschereck; Hendrik Blockeel

We present a novel application ofInductive Logic Programming (ILP) to the problem of diterpene structure elucidation from 13 CNMR spectra. Diterpenes are organic compounds oflow molecular weight with a skeleton of 20 carbon atoms. They are of significant chemical and commercial interest because oftheir use as lead compounds in the search for new pharmaceutical effectors. The interpretation of diterpene 13 CNMR spectra normally requires specialists with detailed spectroscopic knowledge and substantial experience in natural products chemistry, specifically knowledge on peak patterns and chemical structures. Given a database ofpeak patterns for diterpenes with known structure, we apply several ILP approaches to discover correlations between peak patterns and chemical structure. The approaches used include first - order inductive learning, relational instance based learning, induction oflogical decision trees, and inductive constraint logic. Performance close to that of domain experts is achieved, which suffi...

Explore More