Oscar Marbán | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Oscar Marbán is active.

Explore More

Publication

Featured researches published by Oscar Marbán.

Knowledge Engineering Review | 2010

A survey of data mining and knowledge discovery process models and methodologies

Gonzalo Mariscal; Oscar Marbán; Covadonga Fernández

Up to now, many data mining and knowledge discovery methodologies and process models have been developed, with varying degrees of success. In this paper, we describe the most used (in industrial and academic projects) and cited (in scientific literature) data mining and knowledge discovery methodologies and process models, providing an overview of its evolution along data mining and knowledge discovery history and setting down the state of the art in this topic. For every approach, we have provided a brief description of the proposed knowledge discovery in databases (KDD) process, discussing about special features, outstanding advantages and disadvantages of every approach. Apart from that, a global comparative of all presented data mining approaches is provided, focusing on the different steps and tasks in which every approach interprets the whole KDD process. As a result of the comparison, we propose a new data mining and knowledge discovery process named refined data mining process for developing any kind of data mining and knowledge discovery project. The refined data mining process is built on specific steps taken from analyzed approaches.

Expert Systems With Applications | 2010

Clustering-based location in wireless networks

Luis Mengual; Oscar Marbán; Santiago Eibe

In this paper, we propose a three-phase methodology (measurement, calibration and estimation) for locating mobile stations (MS) in an indoor environment using wireless technology. Our solution is a fingerprint-based positioning system that overcomes the problem of the relative effect of doors and walls on signal strength and is independent of network device manufacturers. In the measurement phase, our system collects received signal strength indicator (RSSI) measurements from multiple access points. In the calibration phase, our system utilizes these measurements in a normalization process to create a radio map, a database of RSS patterns. Unlike traditional radio map-based methods, our methodology normalizes RSS measurements collected at different locations (on a floor) and uses artificial neural network models (ANNs) to group them into clusters. In the third phase, we use data mining techniques (clustering) to optimize location results. Experimental results demonstrate the accuracy of the proposed method. From these results it is clear that the system is highly likely to be able to locate a MS in a room or nearby room.

Information Systems | 2009

Toward data mining engineering: A software engineering approach

Oscar Marbán; Javier Segovia; Ernestina Menasalvas; Covadonga Fernández-Baizán

The number, variety and complexity of projects involving data mining or knowledge discovery in databases activities have increased just lately at such a pace that aspects related to their development process need to be standardized for results to be integrated, reused and interchanged in the future. Data mining projects are quickly becoming engineering projects, and current standard processes, like CRISP-DM, need to be revisited to incorporate this engineering viewpoint. This is the central motivation of this paper that makes the point that experience gained about the software development process over almost 40 years could be reused and integrated to improve data mining processes. Consequently, this paper proposes to reuse ideas and concepts underlying the IEEE Std 1074 and ISO 12207 software engineering model processes to redefine and add to the CRISP-DM process and make it a data mining engineering standard.

Information Systems | 2008

A cost model to estimate the effort of data mining projects (DMCoMo)

Oscar Marbán; Ernestina Menasalvas; Covadonga Fernández-Baizán

CRISP-DM is the standard to develop Data Miningprojects. CRISP-DM proposes processes and tasks that you have to carry out to develop a Data Miningproject. A task proposed by CRISP-DM is the cost estimation of the Data Miningproject. In software development a lot of methods are described to estimate the costs of project development (SLIM, SEER-SEM, PRICE-S and COCOMO). These methods are not appropriate in the case of Data Miningprojects because in Data Miningsoftware development is not the first goal. Some methods have been proposed to estimate some phases of a Data Miningproject, but there is no method to estimate the global cost of a generic Data Miningproject. The lack of Data Miningproject estimation methods is because of many real-life project failures due to the non-realistic estimation at the beginning of the projects. Consequently, in this paper we propose to design and validate a parametric cost estimation model, similar to COCOMO or SLIM in software development, for Data Miningprojects (DMCoMo). The drivers of the model will be proposed first and later the equation of the model will be proposed.

ieee international conference on fuzzy systems | 2002

Subsessions: a granular approach to click path analysis

Ernestina Menasalvas; Socorro Millán; José M. Peña; Michael Hadjimichael; Oscar Marbán

Electronic, web-based commerce enables and demands the application of intelligent methods to analyze information collected from consumer web sessions. We propose a method of increasing the granularity of the user session analysis by isolating useful subsessions within web page access sessions, where each subsession represents a frequently traversed path indicating high-level user activity. The subsession approximates user state information as well as anticipated user activity, and as a result is useful for personalization and pre-caching.

atlantic web intelligence conference | 2003

Collaborative filtering using interval estimation naïve Bayes

Víctor Robles; Pedro Larrañaga; José M. Peña; Oscar Marbán; F. Javier Crespo; María S. Pérez

Personalized recommender systems can be classified into three main categories: content-based, mostly used to make suggestions depending on the text of the web documents, collaborative filtering, that use ratings from many users to suggest a document or an action to a given user and hybrid solutions. In the collaborative filtering task we can find algorithms such as the naive Bayes classifier or some of its variants. However, the results of these classifiers can be improved, as we demonstrate through experimental results, with our new semi naive Bayes approach based on intervals. In this work we present this new approach.

Expert Systems With Applications | 2013

Multi-agent location system in wireless networks

Luis Mengual; Oscar Marbán; Santiago Eibe; Ernestina Menasalvas

Highlights? A Multi-Agent Architecture and a methodology for indoor location is proposed. ? Mobile Station will have Fuzzy Location Agent (FLA) with minimum capacity processing. ? FLA establish its location on a plan of the floor of the building. ? FLA communicates with Fuzzy Location Manager Software Agent (FLMSA). ? FLMSA use fuzzy logic to estimation location based on a normalization process of RSS. In this paper we propose a flexible Multi-Agent Architecture together with a methodology for indoor location which allows us to locate any mobile station (MS) such as a Laptop, Smartphone, Tablet or a robotic system in an indoor environment using wireless technology. Our technology is complementary to the GPS location finder as it allows us to locate a mobile system in a specific room on a specific floor using the Wi-Fi networks.The idea is that any MS will have an agent known at a Fuzzy Location Software Agent (FLSA) with a minimum capacity processing at its disposal which collects the power received at different Access Points distributed around the floor and establish its location on a plan of the floor of the building. In order to do so it will have to communicate with the Fuzzy Location Manager Software Agent (FLMSA). The FLMSAs are local agents that form part of the management infrastructure of the Wi-Fi network of the Organization.The FLMSA implements a location estimation methodology divided into three phases (measurement, calibration and estimation) for locating mobile stations (MS). Our solution is a fingerprint-based positioning system that overcomes the problem of the relative effect of doors and walls on signal strength and is independent of the network device manufacturer.In the measurement phase, our system collects received signal strength indicator (RSSI) measurements from multiple access points. In the calibration phase, our system uses these measurements in a normalization process to create a radio map, a database of RSS patterns. Unlike traditional radio map-based methods, our methodology normalizes RSS measurements collected at different locations on a floor. In the third phase, we use Fuzzy Controllers to locate an MS on the plan of the floor of a building.Experimental results demonstrate the accuracy of the proposed method. From these results it is clear that the system is highly likely to be able to locate an MS in a room or adjacent room.

Journal of Systems and Software | 2004

Virtual reality systems estimation vs. traditional systems estimation

Marı́a I. Sánchez-Segura; Juan J. Cuadrado; Ana-Marı́a Moreno; Antonio Amescua; Angélica de Antonio; Oscar Marbán

This paper examines the problems of applying traditional function points count rules to virtual reality systems (VRS). From the analysis of the differences between traditional and VRS systems, a set of deficiencies in the IFPUG 4.1 function points count method was detected. Due to the increasing importance of these kinds of applications, it is necessary to study how traditional function points count rules can be adapted to estimate VRS. In this paper, we are going to focus on the possibility of estimating function points accurately using a proposed guideline which was successfully applied to estimate two VRS.

International Journal of Intelligent Systems | 2004

Subsessions: A granular approach to click path analysis: Click Path Analysis

Ernestina Menasalvas; Socorro Millán; José M. Peña; Michael Hadjimichael; Oscar Marbán

The fiercely competitive web‐based electronic commerce (e‐commerce) environment has made necessary the application of intelligent methods to gather and analyze information collected from consumer web sessions. Knowledge about user behavior and session goals can be discovered from the information gathered about user activities, as tracked by web clicks. Most current approaches to customer behavior analysis study the user session by examining each web page access. However, the abstraction of subsessions provides a more granular view of user activity. Here, we propose a method of increasing the granularity of the user session analysis by isolating useful subsessions within sessions. Each subsession represents a high‐level user activity such as performing a purchase or searching for a particular type of information. Given a set of previously identified subsessions, we can determine at which point the user begins a preidentified subsession by tracking user clicks. With this information we can (1) optimize the user experience by precaching pages or (2) provide an adaptive user experience by presenting pages according to our estimation of the users ultimate goal. To identify subsessions, we present an algorithm to compute frequent click paths from which subsessions then can be isolated. The algorithm functions by scanning all user sessions and extracting all frequent subpaths by using a distance function to determining subpath similarity. Each frequent subpath represents a subsession. An analysis of the pages represented by the subsession provides additional information about semantically related activities commonly performed by users.

Journal of Information Technology & Software Engineering | 2013

Extending UML for Modeling Data Mining Projects (DM-UML)

Oscar Marbán; Javier Segovia

Existing Data Mining process models propose one way or another of developing projects in a structured manner, trying to reduce their complexity through effective project management. It is well-known in any engineering environment that one of the management tasks that helps to reduce project problems is systematic project documentation, but few of the existing Data Mining processes propose their documentation. Furthermore, these few remark the need of producing documentation at each phase as an input for the next, but they don’t show how to do it. On the other hand, in the literature there are examples of UML extensions for data mining projects, but they always focus on the model implementation side and fail to take into account the remainder of the process. In this paper, we present an extension of the UML modeling language for data mining projects (DM-UML) covering all the documentation needs for a project conforming to a standard process, namely CRISP-DM, ranging from business understanding to deployment. We also show an example of a real application of the proposed DM-UML modeling. The result of this approach is that, besides the advantages of having an standardized way of producing the documentation, it clearly constitutes a very useful and transparent tool for modeling and connecting the business understanding or modeling phase with the remainder of the project right through to deployment, as well as a way of facilitating the communication with the nontechnical stakeholders involved in the project, problems which have always been an open question in data mining.

Explore More