Sarah Vluymans | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sarah Vluymans is active.

Explore More

Publication

Featured researches published by Sarah Vluymans.

IEEE Transactions on Fuzzy Systems | 2015

IFROWANN: Imbalanced Fuzzy-Rough Ordered Weighted Average Nearest Neighbor Classification

Enislay Ramentol; Sarah Vluymans; Nele Verbiest; Yailé Caballero; Rafael Bello; Chris Cornelis; Francisco Herrera

Imbalanced classification deals with learning from data with a disproportional number of samples in its classes. Traditional classifiers exhibit poor behavior when facing this kind of data because they do not take into account the imbalanced class distribution. Four main kinds of solutions exist to solve this problem: modifying the data distribution, modifying the learning algorithm for considering the imbalance representation, including the use of costs for data samples, and ensemble methods. In this paper, we adopt the second type of solution and introduce a classification algorithm for imbalanced data that uses fuzzy rough set theory and ordered weighted average aggregation. The proposal considers different strategies to build a weight vector to take into account data imbalance. Our methods are validated by an extensive experimental study, showing statistically better results than 13 other state-of-the-art methods.

Pattern Recognition | 2016

Fuzzy rough classifiers for class imbalanced multi-instance data

Sarah Vluymans; Dánel Sánchez Tarragó; Yvan Saeys; Chris Cornelis; Francisco Herrera

In multi-instance learning, each learning object consists of many descriptive instances. In the corresponding classification problems, each training object is labeled, but its constituent instances are not. The classification objective is to predict the class label of unseen objects. As in traditional single-instance classification, when the class sizes of multi-instance data are imbalanced, classification is degraded. Many multi-instance classifiers have been proposed, but few take into account the possibility of class imbalance, which causes them to fail in this situation. In this paper, we propose a new type of classifier that embodies a solution to the multi-instance class imbalance problem. Our proposal relies on the use of fuzzy rough set theory. We present two families of classifiers respectively based on information extracted at bag-level and at instance-level. We experimentally show that our algorithms outperform state-of-the-art solutions to multi-instance imbalanced data classification, evaluated by the popular metrics AUC and geometric mean. HighlightsWe propose a new type of classifier for imbalanced multi-instance data.Our classification method is based on fuzzy rough set theory.We develop a framework consisting of two classifier families.Our proposal is experimentally shown to outperform the state-of-the-art.

Archive | 2016

Multiple Instance Learning: Foundations and Algorithms

Francisco Herrera; Sebastin Ventura; Rafael Bello; Chris Cornelis; Amelia Zafra; Dnel Snchez-Tarrag; Sarah Vluymans

This book provides a general overview of multiple instance learning (MIL), defining the framework and covering the central paradigms. The authors discuss the most important algorithms for MIL such as classification, regression and clustering. With a focus on classification, a taxonomy is set and the most relevant proposals are specified. Efficient algorithms are developed to discover relevant information when working with uncertainty. Key representative applications are included. This book carries out a study of the key related fields of distance metrics and alternative hypothesis. Chapters examine new and developing aspects of MIL such as data reduction for multi-instance problems and imbalanced MIL data. Class imbalance for multi-instance problems is defined at the bag level, a type of representation that utilizes ambiguity due to the fact that bag labels are available, but the labels of the individual instances are not defined. Additionally, multiple instance multiple label learning is explored. This learning framework introduces flexibility and ambiguity in the object representation providing a natural formulation for representing complicated objects. Thus, an object is represented by a bag of instances and is allowed to have associated multiple class labels simultaneously. This book is suitable for developers and engineers working to apply MIL techniques to solve a variety of real-world problems. It is also useful for researchers or students seeking a thorough overview of MIL literature, methods, and tools.

Fundamenta Informaticae | 2015

Applications of fuzzy rough set theory in machine learning: a survey

Sarah Vluymans; Lynn D’eer; Yvan Saeys; Chris Cornelis

Data used in machine learning applications is prone to contain both vague and incom- plete information. Many authors have proposed to use fuzzy rough set theory in the development of new techniques tackling these characteristics. Fuzzy sets deal with vague data, while rough sets allow to model incomplete information. As such, the hybrid setting of the two paradigms is an ideal candidate tool to confront the separate challenges. In this paper, we present a thorough review on the use of fuzzy rough sets in machine learning applications. We recall their integration in preprocess- ing methods and consider learning algorithms in the supervised, unsupervised and semi-supervised domains and outline future challenges. Throughout the paper, we highlight the interaction between theoretical advances on fuzzy rough sets and practical machine learning tools that take advantage of them.

congress on evolutionary computation | 2015

Evolutionary undersampling for imbalanced big data classification

Isaac Triguero; Mikel Galar; Sarah Vluymans; Chris Cornelis; Humberto Bustince; Francisco Herrera; Yvan Saeys

Classification techniques in the big data scenario are in high demand in a wide variety of applications. The huge increment of available data may limit the applicability of most of the standard techniques. This problem becomes even more difficult when the class distribution is skewed, the topic known as imbalanced big data classification. Evolutionary undersampling techniques have shown to be a very promising solution to deal with the class imbalance problem. However, their practical application is limited to problems with no more than tens of thousands of instances. In this contribution we design a parallel model to enable evolutionary undersampling methods to deal with large-scale problems. To do this, we rely on a MapReduce scheme that distributes the functioning of these kinds of algorithms in a cluster of computing elements. Moreover, we develop a windowing approach for class imbalance data in order to speed up the undersampling process without losing accuracy. In our experiments we test the capabilities of the proposed scheme with several data sets with up to 4 million instances. The results show promising scalability abilities for evolutionary undersampling within the proposed framework.

Archive | 2016

Multiple Instance Learning

Francisco Herrera; Sebastián Ventura; Rafael Bello; Chris Cornelis; Amelia Zafra; Dánel Sánchez-Tarragó; Sarah Vluymans

This chapter provides a general introduction to the main subject matter of this work: multiple instance or multi-instance learning. The two terms are used interchangeably in the literature and they both convey the crucial point of difference with traditional (single-instance) learning. A formal description of multiple instance learning is provided in Sect. 2.1 and we discuss its origins in Sect. 2.2. In Sect. 2.3, we describe different learning tasks within this domain, which may or may not have an equivalent in single-instance learning. Finally, Sect. 2.4 lists a wide variety of applications corresponding to the different multi-instance learning paradigms.

granular computing | 2015

Semi-Supervised Fuzzy-Rough Feature Selection

Richard Jensen; Sarah Vluymans; Neil Mac Parthaláin; Chris Cornelis; Yvan Saeys

With the continued and relentless growth in dataset sizes in recent times, feature or attribute selection has become a necessary step in tackling the resultant intractability. Indeed, as the number of dimensions increases, the number of corresponding data instances required in order to generate accurate models increases exponentially. Fuzzy-rough set-based feature selection techniques offer great flexibility when dealing with real-valued and noisy data; however, most of the current approaches focus on the supervised domain where the data object labels are known. Very little work has been carried out using fuzzy-rough sets in the areas of unsupervised or semi-supervised learning. This paper proposes a novel approach for semi-supervised fuzzy-rough feature selection where the object labels in the data may only be partially present. The approach also has the appealing property that any generated subsets are also valid (super)reducts when the whole dataset is labelled. The experimental evaluation demonstrates that the proposed approach can generate stable and valid subsets even when up to 90 % of the data object labels are missing.

Knowledge and Information Systems | 2018

Dynamic affinity-based classification of multi-class imbalanced data with one-versus-one decomposition: a fuzzy rough set approach

Sarah Vluymans; Alberto Fernández; Yvan Saeys; Chris Cornelis; Francisco Herrera

Class imbalance occurs when data elements are unevenly distributed among classes, which poses a challenge for classifiers. The core focus of the research community has been on binary-class imbalance, although there is a recent trend toward the general case of multi-class imbalanced data. The IFROWANN method, a classifier based on fuzzy rough set theory, stands out for its performance in two-class imbalanced problems. In this paper, we consider its extension to multi-class data by combining it with one-versus-one decomposition. The latter transforms a multi-class problem into two-class sub-problems. Binary classifiers are applied to these sub-problems, after which their outcomes are aggregated into one prediction. We enhance the integration of IFROWANN in the decomposition scheme in two steps. Firstly, we propose an adaptive weight setting for the binary classifier, addressing the varying characteristics of the sub-problems. We call this modified classifier IFROWANN-

Neurocomputing | 2016