Francisco Charte
University of Granada
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Francisco Charte.
Neurocomputing | 2015
Francisco Charte; Antonio J. Rivera; María José del Jesús; Francisco Herrera
The purpose of this paper is to analyze the imbalanced learning task in the multilabel scenario, aiming to accomplish two different goals. The first one is to present specialized measures directed to assess the imbalance level in multilabel datasets (MLDs). Using these measures we will be able to conclude which MLDs are imbalanced, and therefore would need an appropriate treatment. The second objective is to propose several algorithms designed to reduce the imbalance in MLDs in a classifier-independent way, by means of resampling techniques. Two different approaches to divide the instances in minority and majority groups are studied. One of them considers each label combination as class identifier, whereas the other one performs an individual evaluation of each label imbalance level. A random undersampling and a random oversampling algorithm are proposed for each approach, giving as result four different algorithms. All of them are experimentally tested and their effectiveness is statistically evaluated. From the results obtained, a set of guidelines directed to show when these methods should be applied is also provided.
Knowledge Based Systems | 2015
Francisco Charte; Antonio J. Rivera; María José del Jesús; Francisco Herrera
Abstract Learning from imbalanced data is a problem which arises in many real-world scenarios, so does the need to build classifiers able to predict more than one class label simultaneously (multilabel classification). Dealing with imbalance by means of resampling methods is an approach that has been deeply studied lately, primarily in the context of traditional (non-multilabel) classification. In this paper the process of synthetic instance generation for multilabel datasets (MLDs) is studied and MLSMOTE (Multilabel Synthetic Minority Over-sampling Technique), a new algorithm aimed to produce synthetic instances for imbalanced MLDs, is proposed. An extensive review on how imbalance in the multilabel context has been tackled in the past is provided, along with a thorough experimental study aimed to verify the benefits of the proposed algorithm. Several multilabel classification algorithms and other multilabel oversampling methods are considered, as well as ensemble-based algorithms for imbalanced multilabel classification. The empirical analysis shows that MLSMOTE is able to improve the classification results produced by existent proposals.
hybrid artificial intelligence systems | 2013
Francisco Charte; Antonio J. Rivera; María José del Jesús; Francisco Herrera
The process of learning from imbalanced datasets has been deeply studied for binary and multi-class classification. This problem also affects to multi-label datasets. Actually, the imbalance level in multi-label datasets uses to be much larger than in binary or multi-class datasets. Notwithstanding, the proposals on how to measure and deal with imbalanced datasets in multi-label classification are scarce.
hybrid artificial intelligence systems | 2014
Francisco Charte; Antonio J. Rivera; María José del Jesús; Francisco Herrera
In the context of multilabel classification, the learning from imbalanced data is getting considerable attention recently. Several algorithms to face this problem have been proposed in the late five years, as well as various measures to assess the imbalance level. Some of the proposed methods are based on resampling techniques, a very well-known approach whose utility in traditional classification has been proven. This paper aims to describe how a specific characteristic of multilabel datasets (MLDs), the level of concurrence among imbalanced labels, could have a great impact in resampling algorithms behavior. Towards this goal, a measure named SCUMBLE, designed to evaluate this concurrence level, is proposed and its usefulness is experimentally tested. As a result, a straightforward guideline on the effectiveness of multilabel resampling algorithms depending on MLDs characteristics can be inferred.
intelligent data engineering and automated learning | 2014
Francisco Charte; Antonio J. Rivera; María José del Jesús; Francisco Herrera
Learning from imbalanced multilabel data is a challenging task that has attracted considerable attention lately. Some resampling algorithms used in traditional classification, such as random undersampling and random oversampling, have been already adapted in order to work with multilabel datasets.
IEEE Transactions on Neural Networks | 2014
Francisco Charte; Antonio J. Rivera; María José del Jesús; Francisco Herrera
Multilabel classification (MLC) has generated considerable research interest in recent years, as a technique that can be applied to many real-world scenarios. To process them with binary or multiclass classifiers, methods for transforming multilabel data sets (MLDs) have been proposed, as well as adapted algorithms able to work with this type of data sets. However, until now, few studies have addressed the problem of how to deal with MLDs having a large number of labels. This characteristic can be defined as high dimensionality in the label space (output attributes), in contrast to the traditional high dimensionality problem, which is usually focused on the feature space (by means of feature selection) or sample space (by means of instance selection). The purpose of this paper is to analyze dimensionality in the label space in MLDs, and to present a transformation methodology based on the use of association rules to discover label dependencies. These dependencies are used to reduce the label space, to ease the work of any MLC algorithm, and to infer the deleted labels in a final postprocessing stage. The proposed process is validated in an extensive experimentation with several MLDs and classification algorithms, resulting in a statistically significant improvement of performance in some cases, as will be shown.
conference on computer as a tool | 2015
Francisco Charte; Antonio J. Rivera; María José del Jesús; Francisco Herrera
The Web is broadly used nowadays to obtain information about almost any topic, from scientific procedures to cooking recipes. Electronic forums are very popular, with thousands of questions asked and answered every day. Correctly tagging the questions posted by users usually produces quicker and better answers by other users and experts. In this paper a prototype of a system aimed to assist the users while tagging their questions is proposed. To accomplish this task, firstly the text of each post is processed to produce a multilabel dataset, then a lazy nearest neighbor multilabel classification algorithm is used to predict the tags on new posts. The obtained results are promising, opening the door to the developing of a full automated system for this task.
hybrid artificial intelligence systems | 2016
Francisco Charte; David Charte; Antonio J. Rivera; María José del Jesús; Francisco Herrera
Multilabeled data is everywhere on the Internet. From news on digital media and entries published in blogs, to videos hosted in Youtube, every object is usually tagged with a set of labels. This way they can be categorized into several non-exclusive groups. However, publicly available multilabel datasets (MLDs) are not so common. There is a handful of websites providing a few of them, using disparate file formats. Finding proper MLDs, converting them into the correct format and locating the appropriate bibliographic data to cite them are some of the difficulties usually confronted by researchers and practitioners.
hybrid artificial intelligence systems | 2016
Francisco Charte; Antonio J. Rivera; María José del Jesús; Francisco Herrera
Multilabel classification (MLC) is an increasingly widespread data mining technique. Its goal is to categorize patterns in several non-exclusive groups, and it is applied in fields such as news categorization, image labeling and music classification. Comparatively speaking, MLC is a more complex task than multiclass and binary classification, since the classifier must learn the presence of various outputs at once from the same set of predictive variables. The own nature of the data the classifier has to deal with implies a certain complexity degree. How to measure this complexness level strictly from the data characteristics would be an interesting objective. At the same time, the strategy used to partition the data also influences the sample patterns the algorithm has at its disposal to train the classifier. In MLC random sampling is commonly used to accomplish this task.
hybrid artificial intelligence systems | 2012
Francisco Charte; Antonio J. Rivera; María José del Jesús; Francisco Herrera
Multi-label classification is a generalization of well known problems, such as binary or multi-class classification, in a way that each processed instance is associated not with a class (label) but with a subset of these. In recent years different techniques have appeared which, through the transformation of the data or the adaptation of classic algorithms, aim to provide a solution to this relatively recent type of classification problem. This paper presents a new transformation technique for multi-label classification based on the use of association rules aimed at the reduction of the label space to deal with this problem.