Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Melanie Hilario is active.

Publication


Featured researches published by Melanie Hilario.


Knowledge and Information Systems | 2007

Stability of feature selection algorithms: a study on high-dimensional spaces

Alexandros Kalousis; Julien Prados; Melanie Hilario

With the proliferation of extremely high-dimensional data, feature selection algorithms have become indispensable components of the learning process. Strangely, despite extensive work on the stability of learning algorithms, the stability of feature selection algorithms has been relatively neglected. This study is an attempt to fill that gap by quantifying the sensitivity of feature selection algorithms to variations in the training set. We assess the stability of feature selection algorithms based on the stability of the feature preferences that they express in the form of weights-scores, ranks, or a selected feature subset. We examine a number of measures to quantify the stability of feature preferences and propose an empirical way to estimate them. We perform a series of experiments with several feature selection algorithms on a set of proteomics datasets. The experiments allow us to explore the merits of each stability measure and create stability profiles of the feature selection algorithms. Finally, we show how stability profiles can support the choice of a feature selection algorithm.


Artificial Intelligence in Medicine | 2006

Learning from imbalanced data in surveillance of nosocomial infection

Gilles Cohen; Melanie Hilario; Hugo Sax; Stéphane Hugonnet; Antoine Geissbuhler

OBJECTIVE An important problem that arises in hospitals is the monitoring and detection of nosocomial or hospital acquired infections (NIs). This paper describes a retrospective analysis of a prevalence survey of NIs done in the Geneva University Hospital. Our goal is to identify patients with one or more NIs on the basis of clinical and other data collected during the survey. METHODS AND MATERIAL Standard surveillance strategies are time-consuming and cannot be applied hospital-wide; alternative methods are required. In NI detection viewed as a classification task, the main difficulty resides in the significant imbalance between positive or infected (11%) and negative (89%) cases. To remedy class imbalance, we explore two distinct avenues: (1) a new re-sampling approach in which both over-sampling of rare positives and under-sampling of the noninfected majority rely on synthetic cases (prototypes) generated via class-specific sub-clustering, and (2) a support vector algorithm in which asymmetrical margins are tuned to improve recognition of rare positive cases. RESULTS AND CONCLUSION Experiments have shown both approaches to be effective for the NI detection problem. Our novel re-sampling strategies perform remarkably better than classical random re-sampling. However, they are outperformed by asymmetrical soft margin support vector machines which attained a sensitivity rate of 92%, significantly better than the highest sensitivity (87%) obtained via prototype-based re-sampling.


Briefings in Bioinformatics | 2007

Approaches to dimensionality reduction in proteomic biomarker studies

Melanie Hilario; Alexandros Kalousis

Mass-spectra based proteomic profiles have received widespread attention as potential tools for biomarker discovery and early disease diagnosis. A major data-analytical problem involved is the extremely high dimensionality (i.e. number of features or variables) of proteomic data, in particular when the sample size is small. This article reviews dimensionality reduction methods that have been used in proteomic biomarker studies. It then focuses on the problem of selecting the most appropriate method for a specific task or dataset, and proposes method combination as a potential alternative to single-method selection. Finally, it points out the potential of novel dimension reduction techniques, in particular those that incorporate domain knowledge through the use of informative priors or causal inference.


Machine Learning | 2004

On Data and Algorithms: Understanding Inductive Performance

Alexandros Kalousis; João Gama; Melanie Hilario

In this paper we address two symmetrical issues, the discovery of similarities among classification algorithms, and among datasets. Both on the basis of error measures, which we use to define the error correlation between two algorithms, and determine the relative performance of a list of algorithms. We use the first to discover similarities between learners, and both of them to discover similarities between datasets. The latter sketch maps on the dataset space. Regions within each map exhibit specific patterns of error correlation or relative performance. To acquire an understanding of the factors determining these regions we describe them using simple characteristics of the datasets. Descriptions of each region are given in terms of the distributions of dataset characteristics within it.


international conference on data mining | 2005

Stability of feature selection algorithms

Alexandros Kalousis; Julien Prados; Melanie Hilario

With the proliferation of extremely high-dimensional data, feature selection algorithms have become indispensable components of the learning process. Strangely, despite extensive work on the stability of learning algorithms, the stability of feature selection algorithms has been relatively neglected. This study is an attempt to fill that gap by quantifying the sensitivity of feature selection algorithms to variations in the training set. We assess the stability of feature selection algorithms based on the stability of the feature preferences that they express in the form of weights-scores, ranks, or a selected feature subset. We examine a number of measures to quantify the stability of feature preferences and propose an empirical way to estimate them. We perform a series of experiments with several feature selection algorithms on a set of proteomics datasets. The experiments allow us to explore the merits of each stability measure and create stability profiles of the feature selection algorithms. Finally we show how stability profiles can support the choice of a feature selection algorithm.


conference on tools with artificial intelligence | 2000

Model selection via meta-learning: a comparative study

Alexandros Kalousis; Melanie Hilario

The selection of an appropriate inducer is crucial for performing effective classification. In previous work we presented a system called NOEMON which relied on a mapping between dataset characteristics and inducer performance to propose inducers for specific datasets. Instance based learning was used to create that mapping. Here we extend and refine the set of data characteristics; we also use a wider range of base-level inducers and a much larger collection of datasets to create the meta-models. We compare the performance of meta-models produced by instance based learners, decision trees and boosted decision trees. The results show that decision trees and boosted decision trees models enhance the perfomance of the system.


Meta-Learning in Computational Intelligence | 2011

Ontology-Based Meta-Mining of Knowledge Discovery Workflows

Melanie Hilario; Phong Nguyen; Huyen Do; Adam Woznica; Alexandros Kalousis

This chapter describes a principled approach to meta-learning that has three distinctive features. First, whereas most previous work on meta-learning focused exclusively on the learning task, our approach applies meta-learning to the full knowledge discovery process and is thus more aptly referred to as meta-mining. Second, traditional meta-learning regards learning algorithms as black boxes and essentially correlates properties of their input (data) with the performance of their output (learned model). We propose to tear open the black box and analyse algorithms in terms of their core components, their underlying assumptions, the cost functions and optimization strategies they use, and the models and decision boundaries they generate. Third, to ground meta-mining on a declarative representation of the data mining (dm) process and its components, we built a DM ontology and knowledge base using the Web Ontology Language (owl).


Journal of Web Semantics | 2015

The Data Mining OPtimization Ontology

C. Maria Keet; Agnieszka Ławrynowicz; Claudia d’Amato; Alexandros Kalousis; Phong Nguyen; Raúl Palma; Robert Stevens; Melanie Hilario

The Data Mining OPtimization Ontology (DMOP) has been developed to support informed decision-making at various choice points of the data mining process. The ontology can be used by data miners and deployed in ontology-driven information systems. The primary purpose for which DMOP has been developed is the automation of algorithm and model selection through semantic meta-mining that makes use of an ontology-based meta-analysis of complete data mining processes in view of extracting patterns associated with mining performance. To this end, DMOP contains detailed descriptions of data mining tasks (e.g., learning, feature selection), data, algorithms, hypotheses such as mined models or patterns, and workflows. A development methodology was used for DMOP, including items such as competency questions and foundational ontology reuse. Several non-trivial modeling problems were encountered and due to the complexity of the data mining details, the ontology requires the use of the OWL 2 DL profile. DMOP was successfully evaluated for semantic meta-mining and used in constructing the Intelligent Discovery Assistant, deployed at the popular data mining environment RapidMiner.


european conference on machine learning | 2009

Margin and Radius Based Multiple Kernel Learning

Huyen Do; Alexandros Kalousis; Adam Woznica; Melanie Hilario

A serious drawback of kernel methods, and Support Vector Machines (SVM) in particular, is the difficulty in choosing a suitable kernel function for a given dataset. One of the approaches proposed to address this problem is Multiple Kernel Learning (MKL) in which several kernels are combined adaptively for a given dataset. Many of the existing MKL methods use the SVM objective function and try to find a linear combination of basic kernels such that the separating margin between the classes is maximized. However, these methods ignore the fact that the theoretical error bound depends not only on the margin, but also on the radius of the smallest sphere that contains all the training instances. We present a novel MKL algorithm that optimizes the error bound taking account of both the margin and the radius. The empirical results show that the proposed method compares favorably with other state-of-the-art MKL methods.


pacific asia conference on knowledge discovery and data mining | 2001

Feature Selection for Meta-learning

Alexandros Kalousis; Melanie Hilario

The selection of an appropriate inducer is crucial for performing effective classification. In previous work we presented a system called NOEMON which relied on a mapping between dataset characteristics and inducer performance to propose inducers for specific datasets. Instance-based learning was applied to meta-learning problems, each one associated with a specific pair of inducers. The generated models were used to provide a ranking of inducers on new datasets. Instance-based learning assumes that all the attributes have the same importance. We discovered that the best set of discriminating attributes is different for every pair of inducers.We applied a feature selection method on the meta-learning problems, to get the best set of attributes for each problem. The performance of the system is significantly improved.

Collaboration


Dive into the Melanie Hilario's collaboration.

Top Co-Authors

Avatar

Alexandros Kalousis

University of Applied Sciences Western Switzerland

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alex L. Mitchell

European Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge