Roberto Ruiz | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Roberto Ruiz is active.

Explore More

Publication

Featured researches published by Roberto Ruiz.

Pattern Recognition | 2006

Incremental wrapper-based gene selection from microarray data for cancer classification

Roberto Ruiz; José C. Riquelme; Jesús S. Aguilar-Ruiz

Gene expression microarray is a rapidly maturing technology that provides the opportunity to assay the expression levels of thousands or tens of thousands of genes in a single experiment. We present a new heuristic to select relevant gene subsets in order to further use them for the classification task. Our method is based on the statistical significance of adding a gene from a ranked-list to the final subset. The efficiency and effectiveness of our technique is demonstrated through extensive comparisons with other representative heuristics. Our approach shows an excellent performance, not only at identifying relevant genes, but also with respect to the computational cost.

information reuse and integration | 2007

Detecting Fault Modules Applying Feature Selection to Classifiers

Daniel Rodríguez; Roberto Ruiz; Juan Jose Cuadrado-Gallego; Jesús S. Aguilar-Ruiz

At present, automated data collection tools allow us to collect large amounts of information, not without associated problems. This paper, we apply feature selection to several software engineering databases selecting attributes with the final aim that project managers can have a better global vision of the data they manage. In this paper, we make use of attribute selection techniques in different datasets publicly available (PROMISE repository), and different data mining algorithms for classification to defect faulty modules. The results show that in general, smaller datasets with less attributes maintain or improve the prediction capability with less attributes than the original datasets.

Applied Soft Computing | 2012

Evolutionary Generalized Radial Basis Function neural networks for improving prediction accuracy in gene classification using feature selection

Francisco Fernández-Navarro; César Hervás-Martínez; Roberto Ruiz; José C. Riquelme

Radial Basis Function Neural Networks (RBFNNs) have been successfully employed in several function approximation and pattern recognition problems. The use of different RBFs in RBFNN has been reported in the literature and here the study centres on the use of the Generalized Radial Basis Function Neural Networks (GRBFNNs). An interesting property of the GRBF is that it can continuously and smoothly reproduce different RBFs by changing a real parameter @t. In addition, the mixed use of different RBF shapes in only one RBFNN is allowed. Generalized Radial Basis Function (GRBF) is based on Generalized Gaussian Distribution (GGD), which adds a shape parameter, @t, to standard Gaussian Distribution. Moreover, this paper describes a hybrid approach, Hybrid Algorithm (HA), which combines evolutionary and gradient-based learning methods to estimate the architecture, weights and node topology of GRBFNN classifiers. The feasibility and benefits of the approach are demonstrated by means of six gene microarray classification problems taken from bioinformatic and biomedical domains. Three filters were applied: Fast Correlation-Based Filter (FCBF), Best Incremental Ranked Subset (BIRS), and Best Agglomerative Ranked Subset (BARS); this was done in order to identify salient expression genes from among the thousands of genes in microarray data that can directly contribute to determining the class membership of each pattern. After different gene subsets were obtained, the proposed methodology was performed using the selected gene subsets as new input variables. The results confirm that the GRBFNN classifier leads to a promising improvement in accuracy.

Information Sciences | 2012

Searching for rules to detect defective modules: A subgroup discovery approach

Daniel Rodríguez; Roberto Ruiz; José C. Riquelme; Jesús S. Aguilar-Ruiz

Data mining methods in software engineering are becoming increasingly important as they can support several aspects of the software development life-cycle such as quality. In this work, we present a data mining approach to induce rules extracted from static software metrics characterising fault-prone modules. Due to the special characteristics of the defect prediction data (imbalanced, inconsistency, redundancy) not all classification algorithms are capable of dealing with this task conveniently. To deal with these problems, Subgroup Discovery (SD) algorithms can be used to find groups of statistically different data given a property of interest. We propose EDER-SD (Evolutionary Decision Rules for Subgroup Discovery), a SD algorithm based on evolutionary computation that induces rules describing only fault-prone modules. The rules are a well-known model representation that can be easily understood and applied by project managers and quality engineers. Thus, rules can help them to develop software systems that can be justifiably trusted. Contrary to other approaches in SD, our algorithm has the advantage of working with continuous variables as the conditions of the rules are defined using intervals. We describe the rules obtained by applying our algorithm to seven publicly available datasets from the PROMISE repository showing that they are capable of characterising subgroups of fault-prone modules. We also compare our results with three other well known SD algorithms and the EDER-SD algorithm performs well in most cases.

international conference on artificial neural networks | 2005

Heuristic search over a ranking for feature selection

Roberto Ruiz; José C. Riquelme; Jesús S. Aguilar-Ruiz

In this work, we suggest a new feature selection technique that lets us use the wrapper approach for finding a well suited feature set for distinguishing experiment classes in high dimensional data sets. Our method is based on the relevance and redundancy idea, in the sense that a ranked-feature is chosen if additional information is gained by adding it. This heuristic leads to considerably better accuracy results, in comparison to the full set, and other representative feature selection algorithms in twelve well–known data sets, coupled with notable dimensionality reduction.

Information & Software Technology | 2013

A study of subgroup discovery approaches for defect prediction

Daniel Rodríguez; Roberto Ruiz; José C. Riquelme; Rachel Harrison

Context: Although many papers have been published on software defect prediction techniques, machine learning approaches have yet to be fully explored. Objective: In this paper we suggest using a descriptive approach for defect prediction rather than the precise classification techniques that are usually adopted. This allows us to characterise defective modules with simple rules that can easily be applied by practitioners and deliver a practical (or engineering) approach rather than a highly accurate result. Method: We describe two well-known subgroup discovery algorithms, the SD algorithm and the CN2-SD algorithm to obtain rules that identify defect prone modules. The empirical work is performed with publicly available datasets from the Promise repository and object-oriented metrics from an Eclipse repository related to defect prediction. Subgroup discovery algorithms mitigate against characteristics of datasets that hinder the applicability of classification algorithms and so remove the need for preprocessing techniques. Results: The results show that the generated rules can be used to guide testing effort in order to improve the quality of software development projects. Such rules can indicate metrics, their threshold values and relationships between metrics of defective modules. Conclusions: The induced rules are simple to use and easy to understand as they provide a description rather than a complete classification of the whole dataset. Thus this paper represents an engineering approach to defect prediction, i.e., an approach which is useful in practice, easily understandable and can be applied by practitioners.

Expert Systems With Applications | 2012

Fast feature selection aimed at high-dimensional data via hybrid-sequential-ranked searches

Roberto Ruiz; José C. Riquelme; Jesús S. Aguilar-Ruiz; Miguel García-Torres

We address the feature subset selection problem for classification tasks. We examine the performance of two hybrid strategies that directly search on a ranked list of features and compare them with two widely used algorithms, the fast correlation based filter (FCBF) and sequential forward selection (SFS). The proposed hybrid approaches provide the possibility of efficiently applying any subset evaluator, with a wrapper model included, to large and high-dimensional domains. The experiments performed show that our two strategies are competitive and can select a small subset of features without degrading the classification error or the advantages of the strategies under study.

software engineering and advanced applications | 2007

Attribute Selection in Software Engineering Datasets for Detecting Fault Modules

Daniel Rodríguez; Roberto Ruiz; Juan Jose Cuadrado-Gallego; Jesús S. Aguilar-Ruiz; Miguel Garre

Decision making has been traditionally based on managers experience. At present, there is a number of software engineering (SE) repositories, and furthermore, automated data collection tools allow managers to collect large amounts of information, not without associated problems. On the one hand, such a large amount of information can overload project managers. On the other hand, problems found in generic project databases, where the data is collected from different organizations, is the large disparity of its instances. In this paper, we characterize several software engineering databases selecting attributes with the final aim that project managers can have a better global vision of the data they manage. In this paper, we make use of different data mining algorithms to select attributes from the different datasets publicly available (PROMISE repository), and then, use different classifiers to defect faulty modules. The results show that in general, the smaller datasets maintain the prediction capability with a lower number of attributes than the original datasets.

international conference on computational science | 2006

Segmentation of software engineering datasets using the m5 algorithm

Daniel Rodríguez; Juan José Cuadrado; Miguel A. Sicilia; Roberto Ruiz

This paper reports an empirical study that uses clustering techniques to derive segmented models from software engineering repositories, focusing on the improvement of the accuracy of estimates. In particular, we used two datasets obtained from the International Software Benchmarking Standards Group (ISBSG) repository and created clusters using the M5 algorithm. Each cluster is associated with a linear model. We then compare the accuracy of the estimates so generated with the classical multivariate linear regression and least median squares. Results show that there is an improvement in the accuracy of the results when using clustering. Furthermore, these techniques can help us to understand the datasets better; such techniques provide some advantages to project managers while keeping the estimation process within reasonable complexity.

intelligent data analysis | 2005

Analysis of feature rankings for classification

Roberto Ruiz; Jesús S. Aguilar–Ruiz; José C. Riquelme; Norberto Díaz–Díaz

Different ways of contrast generated rankings by feature selection algorithms are presented in this paper, showing several possible interpretations, depending on the given approach to each study. We begin from the premise of no existence of only one ideal subset for all cases. The purpose of these kinds of algorithms is to reduce the data set to each first attributes without losing prediction against the original data set. In this paper we propose a method, feature–ranking performance, to compare different feature–ranking methods, based on the Area Under Feature Ranking Classification Performance Curve (AURC). Conclusions and trends taken from this paper propose support for the performance of learning tasks, where some ranking algorithms studied here operate.

Explore More