Pablo Bermejo
University of Castilla–La Mancha
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Pablo Bermejo.
Knowledge Based Systems | 2012
Pablo Bermejo; Luis de la Ossa; José A. Gámez; José Miguel Puerta
This paper deals with the problem of supervised wrapper-based feature subset selection in datasets with a very large number of attributes. Recently the literature has contained numerous references to the use of hybrid selection algorithms: based on a filter ranking, they perform an incremental wrapper selection over that ranking. Though working fine, these methods still have their problems: (1) depending on the complexity of the wrapper search method, the number of wrapper evaluations can still be too large; and (2) they rely on a univariate ranking that does not take into account interaction between the variables already included in the selected subset and the remaining ones. Here we propose a new approach whose main goal is to drastically reduce the number of wrapper evaluations while maintaining good performance (e.g. accuracy and size of the obtained subset). To do this we propose an algorithm that iteratively alternates between filter ranking construction and wrapper feature subset selection (FSS). Thus, the FSS only uses the first block of ranked attributes and the ranking method uses the current selected subset in order to build a new ranking where this knowledge is considered. The algorithm terminates when no new attribute is selected in the last call to the FSS algorithm. The main advantage of this approach is that only a few blocks of variables are analyzed, and so the number of wrapper evaluations decreases drastically. The proposed method is tested over eleven high-dimensional datasets (2400-46,000 variables) using different classifiers. The results show an impressive reduction in the number of wrapper evaluations without degrading the quality of the obtained subset.
Pattern Recognition Letters | 2011
Pablo Bermejo; José A. Gámez; José Miguel Puerta
Feature subset selection is a key problem in the data-mining classification task that helps to obtain more compact and understandable models without degrading (or even improving) their performance. In this work we focus on FSS in high-dimensional datasets, that is, with a very large number of predictive attributes. In this case, standard sophisticated wrapper algorithms cannot be applied because of their complexity, and computationally lighter filter-wrapper algorithms have recently been proposed. In this work we propose a stochastic algorithm based on the GRASP meta-heuristic, with the main goal of speeding up the feature subset selection process, basically by reducing the number of wrapper evaluations to carry out. GRASP is a multi-start constructive method which constructs a solution in its first stage, and then runs an improving stage over that solution. Several instances of the proposed GRASP method are experimentally tested and compared with state-of-the-art algorithms over 12 high-dimensional datasets. The statistical analysis of the results shows that our proposal is comparable in accuracy and cardinality of the selected subset to previous algorithms, but requires significantly fewer evaluations.
Knowledge Based Systems | 2014
Pablo Bermejo; José A. Gámez; José Miguel Puerta
This paper deals with the problem of wrapper feature subset selection (FSS) in classification-oriented datasets with a (very) large number of attributes. In high-dimensional datasets with thousands of variables, wrapper FSS becomes a laborious computational process because of the amount of CPU time it requires. In this paper we study how under certain circumstances the wrapper FSS process can be speeded up by embedding the classifier into the wrapper algorithm, instead of dealing with it as a black-box. Our proposal is based on the combination of the NB classifier (which is known to be largely beneficial for FSS) with incremental wrapper FSS algorithms. The merit of this approach is analyzed both theoretically and experimentally, and the results show an impressive speed-up for the embedded FSS process.
Expert Systems With Applications | 2011
Pablo Bermejo; José A. Gámez; José Miguel Puerta
E-mail foldering or e-mail classification into user predefined folders can be viewed as a text classification/categorization problem. However, it has some intrinsic properties that make it more difficult to deal with, mainly the large cardinality of the class variable (i.e. the number of folders), the different number of e-mails per class state and the fact that this is a dynamic problem, in the sense that e-mails arrive in our mail-folders following a time-line. Perhaps because of these problems, standard text-oriented classifiers such as Naive Bayes Multinomial do no obtain a good accuracy when applied to e-mail corpora. In this paper, we identify the imbalance among classes/folders as the main problem, and propose a new method based on learning and sampling probability distributions. Our experiments over a standard corpus (ENRON) with seven datasets (e-mail users) show that the results obtained by Naive Bayes Multinomial significantly improve when applying the balancing algorithm first. For the sake of completeness in our experimental study we also compare this with another standard balancing method (SMOTE) and classifiers.
computational intelligence and data mining | 2009
Pablo Bermejo; José A. Gámez; José Miguel Puerta
This paper deals with the problem of wrapper-based feature subset selection in classification oriented datasets with a (very) large number of attributes. In such datasets sophisticated search algorithms like beam search, branch and bound, best first, genetic algorithms, etc., become intractable in the wrapper approach due to the high number of wrapper evaluations to be carried out. One way to alleviate this problem is to use the so-called filter-wrapper approach or Incremental Wrapper-based Subset Selection (IWSS), which consists in the construction of a ranking among the predictive attributes by using a filter measure, and then a wrapper approach is used guided by the rank. In this way the number of wrapper evaluations is linear with the number of predictive attributes. In this paper we present a contribution to the IWSS approach which helps it to obtain more compact subsets, and consists into allow not only the addition of new attributes but also the interchange with some of the already included in the selected subset. The disadvantage of this novelty is that it grows up the worst-case complexity of IWSS up to O(n2), however, as in the case of the well known sequential forward selection (SFS) the actual number of wrapper evaluations is considerably smaller. Empirical tests over 7 (biological) datasets with a large number of attributes demonstrate the success of the proposed approach when comparing with both IWSS and SFS.
International Journal of Pattern Recognition and Artificial Intelligence | 2011
Pablo Bermejo; José A. Gámez; José Miguel Puerta
This paper deals with the problem of feature subset selection in classification-oriented datasets with a (very) large number of attributes. In such datasets complex classical wrapper approaches become intractable due to the high number of wrapper evaluations to be carried out. One way to alleviate this problem is to use the so-called filter-wrapper approach or Incremental Wrapper-based Subset Selection (IWSS), which consists of the construction of a ranking among the predictive attributes by using a filter measure, and then a wrapper approach is used by following the rank. In this way the number of wrapper evaluations is linear on the number of predictive attributes. In this paper we present two contributions to the IWSS approach. The first one is related with obtaining more compact subsets, and enables not only the addition of new attributes but also their interchange with some of those already included in the selected subset. Our second contribution, termed early stopping, sets an adaptive threshold on the number of attributes in the ranking to be considered. The advantages of these new approaches are analyzed both theoretically and experimentally. The results over a set of 12 high-dimensional datasets corroborate the success of our proposals.
conference on multimedia modeling | 2009
Pablo Bermejo; Hideo Joho; Joemon M. Jose; Robert Villa
Low level features of multimedia content often have limited power to discriminate a documents relevance to a query. This motivated researchers to investigate other types of features. In this paper, we investigated four groups of features: low-level object features, behavioural features, vocabulary features, and window-based vocabulary features, to predict the relevance of shots in video retrieval. Search logs from two user studies formed the basis of our evaluation. The experimental results show that the window-based vocabulary features performed best. The behavioural features also showed a promising result, which is useful when the vocabulary features are not available. We also discuss the performance of classifiers.
international conference industrial engineering other applications applied intelligent systems | 2010
Pablo Bermejo; José A. Gámez; José Miguel Puerta
This paper deals with the problem of supervised wrapperbased feature subset selection in datasets with a very large number of attributes. In such datasets sophisticated search algorithms like beam search, branch and bound, best first, genetic algorithms, etc., become intractable in the wrapper approach due to the high number of wrapper evaluations to be carried out. Thus, recently we can find in the literature the use of hybrid selection algorithms: based on a filter ranking, they perform an incremental wrapper selection over that ranking. Though working fine, these methods still have their own problems: (1) depending on the complexity of the wrapper search method, the number of wrapper evaluations can still be too large; and (2) they rely in an univariate ranking that does not take into account interaction between the variables already included in the selected subset and the remaining ones. In this paper we propose to work incrementally in two levels (block-level and attribute-level) in order to use a filter re-ranking method based on conditional mutual information, and the results show that we drastically reduce the number of wrapper evaluations without degrading the quality of the obtained subset (in fact we get the same accuracy but reducing the number of selected attributes).
Journal of general practice | 2015
Pilar Orgaz; Pablo Bermejo; Pedro J. Tárraga; Miguel A. Tricio
Background: Evaluating health-related quality of life is knowing the impact a disease has on patients’ perception of their well-being. Health perception worsens with age and an increasing number of chronic diseases, which occurs with metabolic syndrome, whose prevalence reaches 40-50% after menopause. Thus, knowing the repercussion of this disease on life´s quality will permit to adopt strategies to improve the health results. Objective: Determining the health perception of menopausal women with metabolic syndrome and the factors implicated. Methods: Cross-sectional study in Primary Health Care in the province of Cuenca (Spain). By random sampling, 400 menopausal women aged ≥45 years with metabolic syndrome (NCEP-ATP III) were selected. Survey SF-12 was used to evaluate physical and mental health´s general aspects. Results: Health perception was regular for 52% but good for 40.5%. The mean score of the physical summary component was below the mean for the Spanish population, while the mental component was over the mean for patients over 65 years. ANOVA showed significant differences for the physical summary component for age (p<0.001). Women in the 45-54 age groups obtained a better result for this component than those aged over 65. No differences were found for age groups in the mental component. Conclusions: Menopausal women with metabolic syndrome have a regular health perception being the physical health worse than the mental one. Physical health is better in younger women with studies, without obesity, diabetes, osteoarthritis, osteoporosis or bronchial pathology. Mental health is better in women without hypertriglyceridaemia or psychic disease.
Clinical Medicine Insights: Oncology | 2015
Pablo Bermejo; Alicia Vivo; Pedro J. Tárraga; J. A. Rodríguez-Montes
Background Traditional methods for deciding whether to recommend a patient for a prostate biopsy are based on cut-off levels of stand-alone markers such as prostate-specific antigen (PSA) or any of its derivatives. However, in the last decade we have seen the increasing use of predictive models that combine, in a non-linear manner, several predictives that are better able to predict prostate cancer (PC), but these fail to help the clinician to distinguish between PC and benign prostate hyperplasia (BPH) patients. We construct two new models that are capable of predicting both PC and BPH. Methods An observational study was performed on 150 patients with PSA ≥3 ng/mL and age >50 years. We built a decision tree and a logistic regression model, validated with the leave-one-out methodology, in order to predict PC or BPH, or reject both. Results Statistical dependence with PC and BPH was found for prostate volume (P-value < 0.001), PSA (P-value < 0.001), international prostate symptom score (IPSS; P-value < 0.001), digital rectal examination (DRE; P-value < 0.001), age (P-value < 0.002), antecedents (P-value < 0.006), and meat consumption (P-value < 0.08). The two predictive models that were constructed selected a subset of these, namely, volume, PSA, DRE, and IPSS, obtaining an area under the ROC curve (AUC) between 72% and 80% for both PC and BPH prediction. Conclusion PSA and volume together help to build predictive models that accurately distinguish among PC, BPH, and patients without any of these pathologies. Our decision tree and logistic regression models outperform the AUC obtained in the compared studies. Using these models as decision support, the number of unnecessary biopsies might be significantly reduced.