Sarah Vluymans
Ghent University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sarah Vluymans.
IEEE Transactions on Fuzzy Systems | 2015
Enislay Ramentol; Sarah Vluymans; Nele Verbiest; Yailé Caballero; Rafael Bello; Chris Cornelis; Francisco Herrera
Imbalanced classification deals with learning from data with a disproportional number of samples in its classes. Traditional classifiers exhibit poor behavior when facing this kind of data because they do not take into account the imbalanced class distribution. Four main kinds of solutions exist to solve this problem: modifying the data distribution, modifying the learning algorithm for considering the imbalance representation, including the use of costs for data samples, and ensemble methods. In this paper, we adopt the second type of solution and introduce a classification algorithm for imbalanced data that uses fuzzy rough set theory and ordered weighted average aggregation. The proposal considers different strategies to build a weight vector to take into account data imbalance. Our methods are validated by an extensive experimental study, showing statistically better results than 13 other state-of-the-art methods.
Pattern Recognition | 2016
Sarah Vluymans; Dánel Sánchez Tarragó; Yvan Saeys; Chris Cornelis; Francisco Herrera
In multi-instance learning, each learning object consists of many descriptive instances. In the corresponding classification problems, each training object is labeled, but its constituent instances are not. The classification objective is to predict the class label of unseen objects. As in traditional single-instance classification, when the class sizes of multi-instance data are imbalanced, classification is degraded. Many multi-instance classifiers have been proposed, but few take into account the possibility of class imbalance, which causes them to fail in this situation. In this paper, we propose a new type of classifier that embodies a solution to the multi-instance class imbalance problem. Our proposal relies on the use of fuzzy rough set theory. We present two families of classifiers respectively based on information extracted at bag-level and at instance-level. We experimentally show that our algorithms outperform state-of-the-art solutions to multi-instance imbalanced data classification, evaluated by the popular metrics AUC and geometric mean. HighlightsWe propose a new type of classifier for imbalanced multi-instance data.Our classification method is based on fuzzy rough set theory.We develop a framework consisting of two classifier families.Our proposal is experimentally shown to outperform the state-of-the-art.
Archive | 2016
Francisco Herrera; Sebastin Ventura; Rafael Bello; Chris Cornelis; Amelia Zafra; Dnel Snchez-Tarrag; Sarah Vluymans
This book provides a general overview of multiple instance learning (MIL), defining the framework and covering the central paradigms. The authors discuss the most important algorithms for MIL such as classification, regression and clustering. With a focus on classification, a taxonomy is set and the most relevant proposals are specified. Efficient algorithms are developed to discover relevant information when working with uncertainty. Key representative applications are included. This book carries out a study of the key related fields of distance metrics and alternative hypothesis. Chapters examine new and developing aspects of MIL such as data reduction for multi-instance problems and imbalanced MIL data. Class imbalance for multi-instance problems is defined at the bag level, a type of representation that utilizes ambiguity due to the fact that bag labels are available, but the labels of the individual instances are not defined. Additionally, multiple instance multiple label learning is explored. This learning framework introduces flexibility and ambiguity in the object representation providing a natural formulation for representing complicated objects. Thus, an object is represented by a bag of instances and is allowed to have associated multiple class labels simultaneously. This book is suitable for developers and engineers working to apply MIL techniques to solve a variety of real-world problems. It is also useful for researchers or students seeking a thorough overview of MIL literature, methods, and tools.
Fundamenta Informaticae | 2015
Sarah Vluymans; Lynn D’eer; Yvan Saeys; Chris Cornelis
Data used in machine learning applications is prone to contain both vague and incom- plete information. Many authors have proposed to use fuzzy rough set theory in the development of new techniques tackling these characteristics. Fuzzy sets deal with vague data, while rough sets allow to model incomplete information. As such, the hybrid setting of the two paradigms is an ideal candidate tool to confront the separate challenges. In this paper, we present a thorough review on the use of fuzzy rough sets in machine learning applications. We recall their integration in preprocess- ing methods and consider learning algorithms in the supervised, unsupervised and semi-supervised domains and outline future challenges. Throughout the paper, we highlight the interaction between theoretical advances on fuzzy rough sets and practical machine learning tools that take advantage of them.
congress on evolutionary computation | 2015
Isaac Triguero; Mikel Galar; Sarah Vluymans; Chris Cornelis; Humberto Bustince; Francisco Herrera; Yvan Saeys
Classification techniques in the big data scenario are in high demand in a wide variety of applications. The huge increment of available data may limit the applicability of most of the standard techniques. This problem becomes even more difficult when the class distribution is skewed, the topic known as imbalanced big data classification. Evolutionary undersampling techniques have shown to be a very promising solution to deal with the class imbalance problem. However, their practical application is limited to problems with no more than tens of thousands of instances. In this contribution we design a parallel model to enable evolutionary undersampling methods to deal with large-scale problems. To do this, we rely on a MapReduce scheme that distributes the functioning of these kinds of algorithms in a cluster of computing elements. Moreover, we develop a windowing approach for class imbalance data in order to speed up the undersampling process without losing accuracy. In our experiments we test the capabilities of the proposed scheme with several data sets with up to 4 million instances. The results show promising scalability abilities for evolutionary undersampling within the proposed framework.
Archive | 2016
Francisco Herrera; Sebastián Ventura; Rafael Bello; Chris Cornelis; Amelia Zafra; Dánel Sánchez-Tarragó; Sarah Vluymans
This chapter provides a general introduction to the main subject matter of this work: multiple instance or multi-instance learning. The two terms are used interchangeably in the literature and they both convey the crucial point of difference with traditional (single-instance) learning. A formal description of multiple instance learning is provided in Sect. 2.1 and we discuss its origins in Sect. 2.2. In Sect. 2.3, we describe different learning tasks within this domain, which may or may not have an equivalent in single-instance learning. Finally, Sect. 2.4 lists a wide variety of applications corresponding to the different multi-instance learning paradigms.
granular computing | 2015
Richard Jensen; Sarah Vluymans; Neil Mac Parthaláin; Chris Cornelis; Yvan Saeys
With the continued and relentless growth in dataset sizes in recent times, feature or attribute selection has become a necessary step in tackling the resultant intractability. Indeed, as the number of dimensions increases, the number of corresponding data instances required in order to generate accurate models increases exponentially. Fuzzy-rough set-based feature selection techniques offer great flexibility when dealing with real-valued and noisy data; however, most of the current approaches focus on the supervised domain where the data object labels are known. Very little work has been carried out using fuzzy-rough sets in the areas of unsupervised or semi-supervised learning. This paper proposes a novel approach for semi-supervised fuzzy-rough feature selection where the object labels in the data may only be partially present. The approach also has the appealing property that any generated subsets are also valid (super)reducts when the whole dataset is labelled. The experimental evaluation demonstrates that the proposed approach can generate stable and valid subsets even when up to 90 % of the data object labels are missing.
Knowledge and Information Systems | 2018
Sarah Vluymans; Alberto Fernández; Yvan Saeys; Chris Cornelis; Francisco Herrera
Class imbalance occurs when data elements are unevenly distributed among classes, which poses a challenge for classifiers. The core focus of the research community has been on binary-class imbalance, although there is a recent trend toward the general case of multi-class imbalanced data. The IFROWANN method, a classifier based on fuzzy rough set theory, stands out for its performance in two-class imbalanced problems. In this paper, we consider its extension to multi-class data by combining it with one-versus-one decomposition. The latter transforms a multi-class problem into two-class sub-problems. Binary classifiers are applied to these sub-problems, after which their outcomes are aggregated into one prediction. We enhance the integration of IFROWANN in the decomposition scheme in two steps. Firstly, we propose an adaptive weight setting for the binary classifier, addressing the varying characteristics of the sub-problems. We call this modified classifier IFROWANN-
Neurocomputing | 2016
Sarah Vluymans; Isaac Triguero; Chris Cornelis; Yvan Saeys
IEEE Transactions on Fuzzy Systems | 2016
Sarah Vluymans; Dánel Sánchez Tarragó; Yvan Saeys; Chris Cornelis; Francisco Herrera
{{\mathcal {W}}_{\mathrm{IR}}}