Ivan Bratko
University of Ljubljana
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ivan Bratko.
EWSL'91 Proceedings of the 5th European Conference on European Working Session on Learning | 1991
Bojan Cestnik; Ivan Bratko
In this paper we introduce a new method for decision tree pruning, based on the minimisation of the expected classification error method by Niblett and Bratko. The original Niblett-Bratko pruning algorithm uses Laplace probability estimates. Here we introduce a new, more general Bayesian approach to estimating probabilities which we call m-probability-estimation. By varying a parameter m in this method, tree pruning can be adjusted to particular properties of the learning domain, such as level of noise. The resulting pruning method improves on the original Niblett-Bratko pruning in the following respects: apriori probabilities can be incorporated into error estimation, several trees pruned to various degrees can be generated, and the degree of pruning is not affected by the number of classes. These improvements are supported by experimental findings. m-probability-estimation also enables the combination of learning data obtained from various sources.
Machine Learning | 1994
Marko Bohanec; Ivan Bratko
When communicating concepts, it is often convenient or even necessary to define a concept approximately. A simple, although only approximately accurate concept definition may be more useful than a completely accurate definition which involves a lot of detail. This paper addresses the problem: given a completely accurate, but complex, definition of a concept, simplify the definition, possibly at the expense of accuracy, so that the simplified definition still corresponds to the concept “sufficiently” well. Concepts are represented by decision trees, and the method of simplification is tree pruning. Given a decision tree that accurately specifies a concept, the problem is to find a smallest pruned tree that still represents the concept within some specified accuracy. A pruning algorithm is presented that finds an optimal solution by generating a dense sequence of pruned trees, decreasing in size, such that each tree has the highest accuracy among all the possible pruned trees of the same size. An efficient implementation of the algorithm, based on dynamic programming, is presented and empirically compared with three progressive pruning algorithms using both artificial and real-world data. An interesting empirical finding is that the real-world data generally allow significantly greater simplification at equal loss of accuracy.
european conference on principles of data mining and knowledge discovery | 2003
Aleks Jakulin; Ivan Bratko
Many effective and efficient learning algorithms assume independence of attributes. They often perform well even in domains where this assumption is not really true. However, they may fail badly when the degree of attribute dependencies becomes critical. In this paper, we examine methods for detecting deviations from independence. These dependencies give rise to “interactions” between attributes which affect the performance of learning algorithms. We first formally define the degree of interaction between attributes through the deviation of the best possible “voting” classifier from the true relation between the class and the attributes in a domain. Then we propose a practical heuristic for detecting attribute interactions, called interaction gain. We experimentally investigate the suitability of interaction gain for handling attribute interactions in machine learning. We also propose visualization methods for graphical exploration of interactions in a domain.
Bioinformatics | 2005
Tomaz Curk; Janez Demšar; Qikai Xu; Gregor Leban; Uroš Petrovič; Ivan Bratko; Gad Shaulsky; Blaz Zupan
UNLABELLED Visual programming offers an intuitive means of combining known analysis and visualization methods into powerful applications. The system presented here enables users who are not programmers to manage microarray and genomic data flow and to customize their analyses by combining common data analysis tools to fit their needs. AVAILABILITY http://www.ailab.si/supp/bi-visprog SUPPLEMENTARY INFORMATION http://www.ailab.si/supp/bi-visprog.
international conference on machine learning | 2004
Aleks Jakulin; Ivan Bratko
Attribute interactions are the irreducible dependencies between attributes. Interactions underlie feature relevance and selection, the structure of joint probability and classification models: if and only if the attributes interact, they should be connected. While the issue of 2-way interactions, especially of those between an attribute and the label, has already been addressed, we introduce an operational definition of a generalized n-way interaction by highlighting two models: the reductionistic part-to-whole approximation, where the model of the whole is reconstructed from models of the parts, and the holistic reference model, where the whole is modelled directly. An interaction is deemed significant if these two models are significantly different. In this paper, we propose the Kirkwood superposition approximation for constructing part-to-whole approximations. To model data, we do not assume a particular structure of interactions, but instead construct the model by testing for the presence of interactions. The resulting map of significant interactions is a graphical model learned from the data. We confirm that the P-values computed with the assumption of the asymptotic X2 distribution closely match those obtained with the boot-strap.
Artificial Intelligence | 2007
Martin Možina; Jure Žabkar; Ivan Bratko
We present a novel approach to machine learning, called ABML (argumentation based ML). This approach combines machine learning from examples with concepts from the field of argumentation. The idea is to provide experts arguments, or reasons, for some of the learning examples. We require that the theory induced from the examples explains the examples in terms of the given reasons. Thus arguments constrain the combinatorial search among possible hypotheses, and also direct the search towards hypotheses that are more comprehensible in the light of experts background knowledge. In this paper we realize the idea of ABML as rule learning. We implement ABCN2, an argument-based extension of the CN2 rule learning algorithm, conduct experiments and analyze its performance in comparison with the original CN2 algorithm.
Machine Learning | 2012
Stephen Muggleton; Luc De Raedt; David Poole; Ivan Bratko; Peter A. Flach; Katsumi Inoue; Ashwin Srinivasan
Inductive Logic Programming (ILP) is an area of Machine Learning which has now reached its twentieth year. Using the analogy of a human biography this paper recalls the development of the subject from its infancy through childhood and teenage years. We show how in each phase ILP has been characterised by an attempt to extend theory and implementations in tandem with the development of novel and challenging real-world applications. Lastly, by projection we suggest directions for research which will help the subject coming of age.
Bioinformatics | 2003
Blaz Zupan; Janez Demšar; Ivan Bratko; Peter Juvan; John A. Halter; Adam Kuspa; Gad Shaulsky
MOTIVATION Genetic networks are often used in the analysis of biological phenomena. In classical genetics, they are constructed manually from experimental data on mutants. The field lacks formalism to guide such analysis, and accounting for all the data becomes complicated when large amounts of data are considered. RESULTS We have developed GenePath, an intelligent assistant that automates the analysis of genetic data. GenePath employs expert-defined patterns to uncover gene relations from the data, and uses these relations as constraints in the search for a plausible genetic network. GenePath formalizes genetic data analysis, facilitates the consideration of all the available data in a consistent manner, and the examination of the large number of possible consequences of planned experiments. It also provides an explanation mechanism that traces every finding to the pertinent data. AVAILABILITY GenePath can be accessed at http://genepath.org. SUPPLEMENTARY INFORMATION Supplementary material is available at http://genepath.org/bi-.supp.
Data Mining and Knowledge Discovery | 2006
Gregor Leban; Blaž Zupan; Gaj Vidmar; Ivan Bratko
Data visualization plays a crucial role in identifying interesting patterns in exploratory data analysis. Its use is, however, made difficult by the large number of possible data projections showing different attribute subsets that must be evaluated by the data analyst. In this paper, we introduce a method called VizRank, which is applied on classified data to automatically select the most useful data projections. VizRank can be used with any visualization method that maps attribute values to points in a two-dimensional visualization space. It assesses possible data projections and ranks them by their ability to visually discriminate between classes. The quality of class separation is estimated by computing the predictive accuracy of k-nearest neighbor classifier on the data set consisting of x and y positions of the projected data points and their class information. The paper introduces the method and presents experimental results which show that VizRanks ranking of projections highly agrees with subjective rankings by data analysts. The practical use of VizRank is also demonstrated by an application in the field of functional genomics.
european conference on artificial intelligence | 1999
Blaz Zupan; Janez Demšar; Michael W. Kattan; J. Robert Beck; Ivan Bratko
Machine learning techniques have recently received considerable attention, especially when used for the construction of prediction models from data. Despite their potential advantages over standard statistical methods, like their ability to model non-linear relationships and construct symbolic and interpretable models, their applications to survival analysis are at best rare, primarily because of the difficulty to appropriately handle censored data. In this paper we propose a schema that enables the use of classification methods--including machine learning classifiers--for survival analysis. To appropriately consider the follow-up time and censoring, we propose a technique that, for the patients for which the event did not occur and have short follow-up times, estimates their probability of event and assigns them a distribution of outcome accordingly. Since most machine learning techniques do not deal with outcome distributions, the schema is implemented using weighted examples. To show the utility of the proposed technique, we investigate a particular problem of building prognostic models for prostate cancer recurrence, where the sole prediction of the probability of event (and not its probability dependency on time) is of interest. A case study on preoperative and postoperative prostate cancer recurrence prediction shows that by incorporating this weighting technique the machine learning tools stand beside modern statistical methods and may, by inducing symbolic recurrence models, provide further insight to relationships within the modeled data.