Jose Zubcoff | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jose Zubcoff is active.

Explore More

Publication

Featured researches published by Jose Zubcoff.

data and knowledge engineering | 2007

A UML 2.0 profile to design Association Rule mining models in the multidimensional conceptual modeling of data warehouses

Jose Zubcoff; Juan Trujillo

By using data mining techniques, the data stored in a Data Warehouse (DW) can be analyzed for the purpose of uncovering and predicting hidden patterns within the data. So far, different approaches have been proposed to accomplish the conceptual design of DWs by following the multidimensional (MD) modeling paradigm. In previous work, we have proposed a UML profile for DWs enabling the specification of main MD properties at conceptual level. This paper presents a novel approach to integrating data mining models into multidimensional models in order to accomplish the conceptual design of DWs with Association Rules (AR). To this goal, we extend our previous work by providing another UML profile that allows us to specify Association Rules mining models for DW at conceptual level in a clear and expressive way. The main advantage of our proposal is that the Association Rules rely on the goals and user requirements of the Data Warehouse, instead of the traditional method of specifying Association Rules by considering only the final database implementation structures such as tables, rows or columns. In this way, ARs are specified in the early stages of a DW project, thus reducing the development time and cost. Finally, in order to show the benefits of our approach, we have implemented the specified Association Rules on a commercial database management server.

data warehousing and knowledge discovery | 2007

Integrating clustering data mining into the multidimensional modeling of data warehouses with UML profiles

Jose Zubcoff; Jesús Pardillo; Juan Trujillo

Clustering can be considered the most important unsupervised learning technique finding similar behaviors (clusters) on large collections of data. Data warehouses (DWs) can help users to analyze stored data, because they contain preprocessed data for analysis purposes. Furthermore, the multidimensional (MD) model of DWs, intuitively represents the system underneath. However, most of the clustering data mining are applied at a low-level of abstraction to complex unstructured data. While there are several approaches for clustering on DWs, there is still not a conceptual model for clustering that facilitates modeling with this technique on the multidimensional (MD) model of a DW. Here, we propose (i) a conceptual model for clustering that helps focusing on the data-mining process at the adequate abstraction level and (ii) an extension of the unified modeling language (UML) by means of the UML profiling mechanism allowing us to design clustering data-mining models on top of the MD model of a DW. This will allow us to avoid the duplication of the time-consuming preprocessing stage and simplify the clustering design on top of DWs improving the discovery of knowledge.

edbt icdt workshops | 2012

Open business intelligence: on the importance of data quality awareness in user-friendly data mining

Jose-Norberto Mazón; Jose Zubcoff; Irene Garrigós; Roberto Espinosa; Rolando Rodríguez

Citizens demand more and more data for making decisions in their daily life. Therefore, mechanisms that allow citizens to understand and analyze linked open data (LOD) in a user-friendly manner are highly required. To this aim, the concept of Open Business Intelligence (OpenBI) is introduced in this position paper. OpenBI facilitates non-expert users to (i) analyze and visualize LOD, thus generating actionable information by means of reporting, OLAP analysis, dashboards or data mining; and to (ii) share the new acquired information as LOD to be reused by anyone. One of the most challenging issues of OpenBI is related to data mining, since non-experts (as citizens) need guidance during preprocessing and application of mining algorithms due to the complexity of the mining process and the low quality of the data sources. This is even worst when dealing with LOD, not only because of the different kind of links among data, but also because of its high dimensionality. As a consequence, in this position paper we advocate that data mining for OpenBI requires data quality-aware mechanisms for guiding non-expert users in obtaining and sharing the most reliable knowledge from the available LOD.

data warehousing and knowledge discovery | 2005

Extending the UML for designing association rule mining models for data warehouses

Jose Zubcoff; Juan Trujillo

Association rules (AR) are one of the most popular data mining techniques in searching databases for frequently occurring patterns. In this paper, we present a novel approach to accomplish the conceptual design of data warehouses together with data mining association rules, allowing us to implement the association rules defined in the conceptual modeling phase. The great advantage of our approach is that the association rules are specified from the early stages of a data warehouse project and based on the main final user requirements and data warehouse goals, instead of specifying them on the final database implementation structures such as tables, rows or columns. Finally, to show the benefit of our approach we implement the specified association rules on a commercial data warehouse management server.

data warehousing and knowledge discovery | 2006

Conceptual modeling for classification mining in data warehouses

Jose Zubcoff; Juan Trujillo

Classification is a data mining (DM) technique that generates classes allowing to predict and describe the behavior of a variable based on the characteristics of a dataset. Frequently, DM analysts need to classify large amounts of data using many attributes. Thus, data warehouses (DW) can play an important role in the DM process, because they can easily manage huge quantities of data. There are two approaches used to model mining techniques: the Common Warehouse Model (CWM) and the Predictive Model Markup Language (PMML), both focused on metadata interchanging and sharing, respectively. These standards do not take advantage of the underlying semantic rich multidimensional (MD) model which could save development time and cost. In this paper, we present a conceptual model for Classification and a UML profile that allows the design of Classification on MD models. Our goal is to facilitate the design of these mining models in a DW context by employing an expressive conceptual model that can be used on top of a MD model. Finally, using the designed profile, we implement a case study in a standard database system and show the results.

international conference on web engineering | 2012

Extracting models from web API documentation

Rolando Rodríguez; Roberto Espinosa; Devis Bianchini; Irene Garrigós; Jose-Norberto Mazón; Jose Zubcoff

In order to develop web mashups, designers need an in-depth understanding of each Web API they are using. However, Web API documentation is rather heterogeneous, represented by big HTML files or collection of files in which it is difficult to identify elements such as API methods and how they can be invoked. Models have been widely recognized as first-citizen artifacts for documenting software applications, abstracting from implementation details, thus becoming good candidates to raise the level of automation of web mashup development. In this paper we present an approach for extracting models from Web API documentation. Our contributions are (i) a metamodel for standardizing the information extracted from Web APIs documentation; and (ii) a method for the extraction of models by parsing HTML files containing the Web API documentation, discovering useful data, and automatically generating the corresponding models (that conform to the defined metamodel).

international conference on computational science and its applications | 2011

A set of experiments to consider data quality criteria in classification techniques for data mining

Roberto Espinosa; Jose Zubcoff; Jose-Norberto Mazón

A successful data mining process depends on the data quality of the sources in order to obtain reliable knowledge. Therefore, preprocessing data is required for dealing with data quality criteria. However, preprocessing data has been traditionally seen as a time-consuming and non-trivial task since data quality criteria have to be considered without any guide about how they affect the data mining process. To overcome this situation, in this paper, we propose to analyze the data mining techniques to know the behavior of different data quality criteria on the sources and how they affects the results of the algorithms. To this aim, we have conducted a set of experiments to assess three data quality criteria: completeness, correlation and balance of data. This work is a first step towards considering, in a systematic and structured manner, data quality criteria for supporting and guiding data miners in obtaining reliable knowledge.

2013 XXXIX Latin American Computing Conference (CLEI) | 2013

Towards a data quality model for open data portals

Edgar Oviedo; Jose-Norberto Mazón; Jose Zubcoff

Data that can be reused and redistributed without any restriction is called Open Data. These two features make its quality can be greatly affected. To date, the most used quality criteria of Open Data are those established in the 5-Stars Model. This article aims to extend this model and corroborate the existence of specific quality criteria for Open Data and its corresponding measurement mechanisms. We propose a new Quality Model for Open Data portals which is exposed from two points of view: qualitative and quantitative. To illustrate the use of this model, we implemented a study case based on real open data about Municipality of Perez Zeledon in Costa Rica, which was evaluated with the qualitative model.

International Symposium on Data-Driven Process Discovery and Analysis | 2013

Enabling Non-expert Users to Apply Data Mining for Bridging the Big Data Divide

Roberto Espinosa; Diego García-Saiz; Marta E. Zorrilla; Jose Zubcoff; Jose-Norberto Mazón

Non-expert users find complex to gain richer insights into the increasingly amount of available heterogeneous data, the so called big data. Advanced data analysis techniques, such as data mining, are difficult to apply due to the fact that (i) a great number of data mining algorithms can be applied to solve the same problem, and (ii) correctly applying data mining techniques always requires dealing with the inherent features of the data source. Therefore, we are attending a novel scenario in which non-experts are unable to take advantage of big data, while data mining experts do: the big data divide. In order to bridge this gap, we propose an approach to offer non-expert miners a tool that just by uploading their data sets, return them the more accurate mining pattern without dealing with algorithms or settings, thanks to the use of a data mining algorithm recommender. We also incorporate a previous task to help non-expert users to specify data mining requirements and a later task in which users are guided in interpreting data mining results. Furthermore, we experimentally test the feasibility of our approach, in particular, the method to build recommenders in an educational context, where instructors of e-learning courses are non-expert data miners who need to discover how their courses are used in order to make informed decisions to improve them.

international conference on geoinformatics | 2009

A model driven framework for geographic knowledge discovery

Octavio Glorio; Jose Zubcoff; Juan Trujillo

Geographic knowledge discovery (GKD) is the process of extracting information and knowledge from massive georeferenced databases. Usually the process is accomplished by two different systems, the Geographic Information Systems (GIS) and the data mining engines. However, the development of those systems is a complex task due to it does not follow a systematic, integrated and standard methodology. To overcome these pitfalls, in this paper, we propose a modeling framework that addresses the development of the different parts of a multilayer GKD process. The main advantages of our framework are that: (i) it reduces the design effort, (ii) it improves quality systems obtained, (iii) it is independent of platforms, (iv) it facilitates the use of data mining techniques on geo-referenced data, and finally, (v) it ameliorates the communication between different users.

Explore More