Remco R. Bouckaert | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Remco R. Bouckaert is active.

Explore More

Publication

Featured researches published by Remco R. Bouckaert.

Bioinformatics | 2010

DensiTree: making sense of sets of phylogenetic trees

Remco R. Bouckaert

MOTIVATION Bayesian analysis through programs like BEAST (Drummond and Rumbaut, 2007) and MrBayes (Huelsenbeck et al., 2001) provides a powerful method for reconstruction of evolutionary relationships. One of the benefits of Bayesian methods is that well-founded estimates of uncertainty in models can be made available. So, for example, not only the mean time of a most recent common ancestor (tMRCA) is estimated, but also the spread. This distribution over model space is represented by a set of trees, which can be rather large and difficult to interpret. DensiTree is a tool that helps navigating these sets of trees. RESULTS The main idea behind DensiTree is to draw all trees in the set transparently. As a result, areas where a lot of the trees agree in topology and branch lengths show up as highly colored areas, while areas with little agreement show up as webs. This makes it possible to quickly get an impression of properties of the tree set such as well-supported clades, distribution of tMRCA and areas of topological uncertainty. Thus, DensiTree provides a quick method for qualitative analysis of tree sets. AVAILABILITY DensiTree is freely available from http://compevol.auckland.ac.nz/software/DensiTree/. The program is licensed under GPL and source code is available. CONTACT [email protected]

pacific-asia conference on knowledge discovery and data mining | 2004

Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms

Remco R. Bouckaert; Eibe Frank

Empirical research in learning algorithms for classification tasks generally requires the use of significance tests. The quality of a test is typically judged on Type I error (how often the test indicates a difference when it should not) and Type II error (how often it indicates no difference when it should). In this paper we argue that the replicability of a test is also of importance. We say that a test has low replicability if its outcome strongly depends on the particular random partitioning of the data that is used to perform it. We present empirical measures of replicability and use them to compare the performance of several popular tests in a realistic setting involving standard learning algorithms and benchmark datasets. Based on our results we give recommendations on which test to use.

european conference on symbolic and quantitative approaches to reasoning and uncertainty | 1993

Probalistic Network Construction Using the Minimum Description Length Principle

Remco R. Bouckaert

This paper presents a procedure for the construction of probabilistic networks from a database of observations based on the minimum description length principle. On top of the advantages of the Bayesian approach the minimum description length principle offers the advantage that every probabilistic network structure that represents the same set of independencies gets assigned the same quality. This makes it is very suitable for the order optimization procedure as described in [4]. Preliminary test results show that the algorithm performs comparable to the algorithm based on the Bayesian approach [6].

european conference on principles of data mining and knowledge discovery | 2006

Naive bayes for text classification with unbalanced classes

Eibe Frank; Remco R. Bouckaert

Multinomial naive Bayes (MNB) is a popular method for document classification due to its computational efficiency and relatively good predictive performance. It has recently been established that predictive performance can be improved further by appropriate data transformations [1,2]. In this paper we present another transformation that is designed to combat a potential problem with the application of MNB to unbalanced datasets. We propose an appropriate correction by adjusting attribute priors. This correction can be implemented as another data normalization step, and we show that it can significantly improve the area under the ROC curve. We also show that the modified version of MNB is very closely related to the simple centroid-based classifier and compare the two methods empirically.

uncertainty in artificial intelligence | 1994

Properties of Bayesian belief network learning algorithms

Remco R. Bouckaert

In this paper the behavior of various belief network learning algorithms is studied. Selecting belief networks with certain minimallity properties turns out to be NP-hard, which justifies the use of search heuristics. Search heuristics based on the Bayesian measure of Cooper and Herskovits and a minimum description length (MDL) measure are compared with respect to their properties for both limiting and finite database sizes. It is shown that the MDL measure has more desirable properties than the Bayesian measure. Experimental results suggest that for learning probabilities of belief networks smoothing is helpful.

australasian joint conference on artificial intelligence | 2004

Naive bayes classifiers that perform well with continuous variables

Remco R. Bouckaert

There are three main methods for handling continuous variables in naive Bayes classifiers, namely, the normal method (parametric approach), the kernel method (non parametric approach) and discretization In this article, we perform a methodologically sound comparison of the three methods, which shows large mutual differences of each of the methods and no single method being universally better This suggests that a method for selecting one of the three approaches to continuous variables could improve overall performance of the naive Bayes classifier We present three methods that can be implemented efficiently v-fold cross validation for the normal, kernel and discretization method Empirical evidence suggests that selection using 10 fold cross validation (especially when repeated 10 times) can largely and significantly improve over all performance of naive Bayes classifiers and consistently outperform any of the three popular methods for dealing with continuous variables on their own This is remarkable, since selection among more classifiers does not consistently result in better accuracy.

international conference on machine learning | 2004

Estimating replicability of classifier learning experiments

Remco R. Bouckaert

Replicability of machine learning experiments measures how likely it is that the outcome of one experiment is repeated when performed with a different randomization of the data. In this paper, we present an estimator of replicability of an experiment that is efficient. More precisely, the estimator is unbiased and has lowest variance in the class of estimators formed by a linear combination of outcomes of experiments on a given data set.We gathered empirical data for comparing experiments consisting of different sampling schemes and hypothesis tests. Both factors are shown to have an impact on replicability of experiments. The data suggests that sign tests should not be used due to low replicability. Ranked sum tests show better performance, but the combination of a sorted runs sampling scheme with a t-test gives the most desirable performance judged on Type I and II error and replicability.

International Journal of Approximate Reasoning | 1996

A modified simulation scheme for inference in Bayesian networks

Remco R. Bouckaert; Enrique Castillo; José Manuel Gutiérrez

Abstract We introduce an approximation method for uncertainty propagation based on a modification of the stratified simulation. The method uses a deterministic or perfect sample and calculates the number of times simulated instantiations are selected, avoiding the repetition of identical instantiations which occurs in the standard stratified simulation method. A theoretical analysis is presented to evaluate the performance of the method in comparison with the stratified simulation scheme. The analysis gives a technique to select the required step for the estimation of probabilities with a given error. Some experimental studies compare the proposed with other simulation methods and show a large performance improvement in computation time as well as in simulation errors.

uncertainty in artificial intelligence | 1994

A stratified simulation scheme for inference in Bayesian belief networks

Remco R. Bouckaert

Simulation schemes for probabilistic inference in Bayesian belief networks offer many advantages over exact algorithms; for example, these schemes have a linear and thus predictable runtime while exact algorithms have exponential runtime. Experiments have shown that likelihood weighting is one of the most promising simulation schemes. In this paper, we present a new simulation scheme that generates samples more evenly spread in the sample space than the likelihood weighting scheme. We show both theoretically and experimentally that the stratified scheme outperforms likelihood weighting in average runtime and error in estimates of beliefs.

uncertainty in artificial intelligence | 1992

Optimizing causal orderings for generating DAGs from data

Remco R. Bouckaert

An algorithm for generating the structure of a directed acyclic graph from data using the notion of causal input lists is presented. The algorithm manipulates the ordering of the variables with operations which very much resemble arc reversal. Operations are only applied if the DAG after the operation represents at least the independencies represented by the DAG before the operation until no more arcs can be removed from the DAG. The resulting DAG is a minimal I-map.

Explore More