Alexandros Kalousis
University of Applied Sciences Western Switzerland
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alexandros Kalousis.
Knowledge and Information Systems | 2007
Alexandros Kalousis; Julien Prados; Melanie Hilario
With the proliferation of extremely high-dimensional data, feature selection algorithms have become indispensable components of the learning process. Strangely, despite extensive work on the stability of learning algorithms, the stability of feature selection algorithms has been relatively neglected. This study is an attempt to fill that gap by quantifying the sensitivity of feature selection algorithms to variations in the training set. We assess the stability of feature selection algorithms based on the stability of the feature preferences that they express in the form of weights-scores, ranks, or a selected feature subset. We examine a number of measures to quantify the stability of feature preferences and propose an empirical way to estimate them. We perform a series of experiments with several feature selection algorithms on a set of proteomics datasets. The experiments allow us to explore the merits of each stability measure and create stability profiles of the feature selection algorithms. Finally, we show how stability profiles can support the choice of a feature selection algorithm.
Briefings in Bioinformatics | 2007
Melanie Hilario; Alexandros Kalousis
Mass-spectra based proteomic profiles have received widespread attention as potential tools for biomarker discovery and early disease diagnosis. A major data-analytical problem involved is the extremely high dimensionality (i.e. number of features or variables) of proteomic data, in particular when the sample size is small. This article reviews dimensionality reduction methods that have been used in proteomic biomarker studies. It then focuses on the problem of selecting the most appropriate method for a specific task or dataset, and proposes method combination as a potential alternative to single-method selection. Finally, it points out the potential of novel dimension reduction techniques, in particular those that incorporate domain knowledge through the use of informative priors or causal inference.
BMC Bioinformatics | 2010
Mohammed Dakna; Keith Harris; Alexandros Kalousis; Sebastien Carpentier; Walter Kolch; Joost P. Schanstra; Marion Haubitz; Antonia Vlahou; Harald Mischak; Mark A. Girolami
BackgroundThe purpose of this manuscript is to provide, based on an extensive analysis of a proteomic data set, suggestions for proper statistical analysis for the discovery of sets of clinically relevant biomarkers. As tractable example we define the measurable proteomic differences between apparently healthy adult males and females. We choose urine as body-fluid of interest and CE-MS, a thoroughly validated platform technology, allowing for routine analysis of a large number of samples. The second urine of the morning was collected from apparently healthy male and female volunteers (aged 21-40) in the course of the routine medical check-up before recruitment at the Hannover Medical School.ResultsWe found that the Wilcoxon-test is best suited for the definition of potential biomarkers. Adjustment for multiple testing is necessary. Sample size estimation can be performed based on a small number of observations via resampling from pilot data. Machine learning algorithms appear ideally suited to generate classifiers. Assessment of any results in an independent test-set is essential.ConclusionsValid proteomic biomarkers for diagnosis and prognosis only can be defined by applying proper statistical data mining procedures. In particular, a justification of the sample size should be part of the study design.
intelligent data analysis | 1999
Alexandros Kalousis; Theoharis Theoharis
The selection of an appropriate classification model and algorithm is crucial for effective knowledge discovery on a dataset. For large databases, common in data mining, such a selection is necessary, because the cost of invoking all alternative classifiers is prohibitive. This selection task is impeded by two factors. First, there are many performance criteria, and the behaviour of a classifier varies considerably with them. Second, a classifiers performance is strongly affected by the characteristics of the dataset.Classifier selection implies mastering a lot of background information on the dataset, the models and the algorithms in question. An intelligent assistant can reduce this effort by inducing helpful suggestions from background information. In this study, we present such an assistant, NOEMON. For each registered classifier, NOEMON measures its performance for a collection of datasets. Rules are induced from those measurements and accommodated in a knowledge base. The suggestion on the most appropriate classifiers for a dataset is then based on those rules. Results on the performance of an initial prototype are also given.
Machine Learning | 2004
Alexandros Kalousis; João Gama; Melanie Hilario
In this paper we address two symmetrical issues, the discovery of similarities among classification algorithms, and among datasets. Both on the basis of error measures, which we use to define the error correlation between two algorithms, and determine the relative performance of a list of algorithms. We use the first to discover similarities between learners, and both of them to discover similarities between datasets. The latter sketch maps on the dataset space. Regions within each map exhibit specific patterns of error correlation or relative performance. To acquire an understanding of the factors determining these regions we describe them using simple characteristics of the datasets. Descriptions of each region are given in terms of the distributions of dataset characteristics within it.
european conference on machine learning | 2001
Hilan Bensusan; Alexandros Kalousis
This paper investigates the use of meta-learning to estimate the predictive accuracy of a classifier. We present a scenario where meta-learning is seen as a regression task and consider its potential in connection with three strategies of dataset characterization. We show that it is possible to estimate classifier performance with a high degree of confidence and gain knowledge about the classifier through the regression models generated. We exploit the results of the models to predict the ranking of the inducers. We also show that the best strategy for performance estimation is not necessarily the best one for ranking generation.
international conference on data mining | 2005
Alexandros Kalousis; Julien Prados; Melanie Hilario
With the proliferation of extremely high-dimensional data, feature selection algorithms have become indispensable components of the learning process. Strangely, despite extensive work on the stability of learning algorithms, the stability of feature selection algorithms has been relatively neglected. This study is an attempt to fill that gap by quantifying the sensitivity of feature selection algorithms to variations in the training set. We assess the stability of feature selection algorithms based on the stability of the feature preferences that they express in the form of weights-scores, ranks, or a selected feature subset. We examine a number of measures to quantify the stability of feature preferences and propose an empirical way to estimate them. We perform a series of experiments with several feature selection algorithms on a set of proteomics datasets. The experiments allow us to explore the merits of each stability measure and create stability profiles of the feature selection algorithms. Finally we show how stability profiles can support the choice of a feature selection algorithm.
conference on tools with artificial intelligence | 2000
Alexandros Kalousis; Melanie Hilario
The selection of an appropriate inducer is crucial for performing effective classification. In previous work we presented a system called NOEMON which relied on a mapping between dataset characteristics and inducer performance to propose inducers for specific datasets. Instance based learning was used to create that mapping. Here we extend and refine the set of data characteristics; we also use a wider range of base-level inducers and a much larger collection of datasets to create the meta-models. We compare the performance of meta-models produced by instance based learners, decision trees and boosted decision trees. The results show that decision trees and boosted decision trees models enhance the perfomance of the system.
Meta-Learning in Computational Intelligence | 2011
Melanie Hilario; Phong Nguyen; Huyen Do; Adam Woznica; Alexandros Kalousis
This chapter describes a principled approach to meta-learning that has three distinctive features. First, whereas most previous work on meta-learning focused exclusively on the learning task, our approach applies meta-learning to the full knowledge discovery process and is thus more aptly referred to as meta-mining. Second, traditional meta-learning regards learning algorithms as black boxes and essentially correlates properties of their input (data) with the performance of their output (learned model). We propose to tear open the black box and analyse algorithms in terms of their core components, their underlying assumptions, the cost functions and optimization strategies they use, and the models and decision boundaries they generate. Third, to ground meta-mining on a declarative representation of the data mining (dm) process and its components, we built a DM ontology and knowledge base using the Web Ontology Language (owl).
Journal of Web Semantics | 2015
C. Maria Keet; Agnieszka Ławrynowicz; Claudia d’Amato; Alexandros Kalousis; Phong Nguyen; Raúl Palma; Robert Stevens; Melanie Hilario
The Data Mining OPtimization Ontology (DMOP) has been developed to support informed decision-making at various choice points of the data mining process. The ontology can be used by data miners and deployed in ontology-driven information systems. The primary purpose for which DMOP has been developed is the automation of algorithm and model selection through semantic meta-mining that makes use of an ontology-based meta-analysis of complete data mining processes in view of extracting patterns associated with mining performance. To this end, DMOP contains detailed descriptions of data mining tasks (e.g., learning, feature selection), data, algorithms, hypotheses such as mined models or patterns, and workflows. A development methodology was used for DMOP, including items such as competency questions and foundational ontology reuse. Several non-trivial modeling problems were encountered and due to the complexity of the data mining details, the ontology requires the use of the OWL 2 DL profile. DMOP was successfully evaluated for semantic meta-mining and used in constructing the Intelligent Discovery Assistant, deployed at the popular data mining environment RapidMiner.