Petar Jovanovic
Polytechnic University of Catalonia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Petar Jovanovic.
Information Systems | 2014
Petar Jovanovic; Oscar Romero; Alkis Simitsis; Alberto Abelló; Daria Mayorova
Designing data warehouse (DW) systems in highly dynamic enterprise environments is not an easy task. At each moment, the multidimensional (MD) schema needs to satisfy the set of information requirements posed by the business users. At the same time, the diversity and heterogeneity of the data sources need to be considered in order to properly retrieve needed data. Frequent arrival of new business needs requires that the system is adaptable to changes. To cope with such an inevitable complexity (both at the beginning of the design process and when potential evolution events occur), in this paper we present a semi-automatic method called ORE, for creating DW designs in an iterative fashion based on a given set of information requirements. Requirements are first considered separately. For each requirement, ORE expects the set of possible MD interpretations of the source data needed for that requirement (in a form similar to an MD schema). Incrementally, ORE builds the unified MD schema that satisfies the entire set of requirements and meet some predefined quality objectives. We have implemented ORE and performed a number of experiments to study our approach. We have also conducted a limited-scale case study to investigate its usefulness to designers.
data warehousing and knowledge discovery | 2012
Petar Jovanovic; Oscar Romero; Alkis Simitsis; Alberto Abelló
Data warehouse (DW) design is based on a set of requirements expressed as service level agreements (SLAs) and business level objects (BLOs). Populating a DW system from a set of information sources is realized with extract-transform-load (ETL) processes based on SLAs and BLOs. The entire task is complex, time consuming, and hard to be performed manually. This paper presents our approach to the requirement-driven creation of ETL designs. Each requirement is considered separately and a respective ETL design is produced. We propose an incremental method for consolidating these individual designs and creating an ETL design that satisfies all given requirements. Finally, the design produced is sent to an ETL engine for execution. We illustrate our approach through an example based on TPC-H and report on our experimental findings that show the effectiveness and quality of our approach.
international conference on data engineering | 2014
Petar Jovanovic; Alkis Simitsis; Kevin Wilkinson
A complex analytic flow in a modern enterprise may perform multiple, logically independent, tasks where each task uses a different processing engine. We term these multi-engine flows hybrid flows. Using multiple processing engines has advantages such as rapid deployment, better performance, lower cost, and so on. However, as the number and variety of these engines grows, developing and maintaining hybrid flows is a significant challenge because they are specified at a physical level and, so are hard to design and may break as the infrastructure evolves. We address this problem by enabling flow design at a logical level and automatic translation to physical flows. There are three main challenges. First, we describe how flows can be represented at a logical level, abstracting away details of any underlying processing engine. Second, we show how a physical flow, expressed in a programming language or some design GUI, can be imported and converted to a logical flow. In particular, we show how a hybrid flow comprising subflows in different languages can be imported and composed as a single, logical flow for subsequent manipulation. Third, we describe how a logical flow is translated into one or more physical flows for execution by the processing engines. The paper concludes with experimental results and example transformations that demonstrate the correctness and utility of our system.
data warehousing and olap | 2012
Petar Jovanovic; Oscar Romero; Alkis Simitsis; Alberto Abelló
Designing a data warehouse (DW) highly depends on the information requirements of its business users. However, tailoring a DW design that satisfies all business requirements is not an easy task. In addition, complex and evolving business environments result in a continuous emergence of new or changed business needs. Furthermore, for building a correct multidimensional (MD) schema for a DW, the designer should deal with the semantics and heterogeneity of the underlying data sources. To cope with such an inevitable complexity, both at the beginning of the design process and when a potential evolution event occurs, in this paper we present a semi-automatic method, named ORE, for constructing the MD schema in an iterative fashion based on the information requirements. In our approach, we consider each requirement separately and incrementally build the unified MD schema satisfying the entire set of requirements.
data warehousing and olap | 2015
Rizkallah Touma; Oscar Romero; Petar Jovanovic
Data integration aims to facilitate the exploitation of heterogeneous data by providing the user with a unified view of data residing in different sources. Currently, ontologies are commonly used to represent this unified view in terms of a global target schema due to their flexibility and expressiveness. However, most approaches still assume a predefined target schema and focus on generating the mappings between this schema and the sources. In this paper, we propose a solution that supports data integration tasks by employing semi-automatic ontology construction to create a target schema on the fly. To that end, we revisit existing ontology extraction, matching and merging techniques and integrate them into a single end-to-end system. Moreover, we extend the used techniques with the automatic generation of mappings between the extracted ontologies and the underlying data sources. Finally, to demonstrate the usefulness of our solution, we integrate it with an independent data integration system.
extending database technology | 2015
Petar Jovanovic; Oscar Romero; Alkis Simitsis; Alberto Abelló; Héctor Candón; Sergi Nadal
The design lifecycle of a data warehousing (DW) system is primarily led by requirements of its end-users and the complexity of underlying data sources. The process of designing a multidimensional (MD) schema and back-end extracttransform-load (ETL) processes, is a long-term and mostly manual task. As enterprises shift to more real-time and ’onthe-y’ decision making, business intelligence (BI) systems require automated means for eciently adapting a physical DW design to frequent changes of business needs. To address this problem, we present Quarry, an end-to-end system for assisting users of various technical skills in managing the incremental design and deployment of MD schemata and ETL processes. Quarry automates the physical design of a DW system from high-level information requirements. Moreover, Quarry provides tools for eciently accommodating MD schema and ETL process designs to new or changed information needs of its end-users. Finally, Quarry facilitates the deployment of the generated DW design over an extensible list of execution engines. On-site, we will use a variety of examples to show how Quarry facilitates the complexity of the DW design lifecycle.
international conference on conceptual modeling | 2012
Petar Jovanovic; Oscar Romero; Alkis Simitsis; Alberto Abelló
We present our tool, GEM, for assisting designers in the error-prone and time-consuming tasks carried out at the early stages of a data warehousing project. Our tool semi-automatically produces multidimensional (MD) and ETL conceptual designs from a given set of business requirements (like SLAs) and data source descriptions. Subsequently, our tool translates both the MD and ETL conceptual designs produced into physical designs, so they can be further deployed on a DBMS and an ETL engine. In this paper, we describe the system architecture and present our demonstration proposal by means of an example.
IEEE Transactions on Knowledge and Data Engineering | 2016
Petar Jovanovic; Oscar Romero; Alkis Simitsis; Alberto Abelló
Business intelligence (BI) systems depend on efficient integration of disparate and often heterogeneous data. The integration of data is governed by data-intensive flows and is driven by a set of information requirements. Designing such flows is in general a complex process, which due to the complexity of business environments is hard to be done manually. In this paper, we deal with the challenge of efficient design and maintenance of data-intensive flows and propose an incremental approach, namely CoAl , for semi-automatically consolidating data-intensive flows satisfying a given set of information requirements. CoAl works at the logical level and consolidates data flows from either high-level information requirements or platform-specific programs. As CoAl integrates a new data flow, it opts for maximal reuse of existing flows and applies a customizable cost model tuned for minimizing the overall cost of a unified solution. We demonstrate the efficiency and effectiveness of our approach through an experimental evaluation using our implemented prototype.
data warehousing and olap | 2014
Emona Nakucçi; Vasileios Theodorou; Petar Jovanovic; Alberto Abelló
Obtaining the right set of data for evaluating the fulfillment of different quality standards in the extract-transform-load (ETL) process design is rather challenging. First, the real data might be out of reach due to different privacy constraints, while providing a synthetic set of data is known as a labor-intensive task that needs to take various combinations of process parameters into account. Additionally, having a single dataset usually does not represent the evolution of data throughout the complete process lifespan, hence missing the plethora of possible test cases. To facilitate such demanding task, in this paper we propose an automatic data generator (i.e., Bijoux). Starting from a given ETL process model, Bijoux extracts the semantics of data transformations, analyzes the constraints they imply over data, and automatically generates testing datasets. At the same time, it considers different dataset and transformation characteristics (e.g., size, distribution, selectivity, etc.) in order to cover a variety of test scenarios. We report our experimental findings showing the effectiveness and scalability of our approach.
international conference on management of data | 2013
Alkis Simitsis; Kevin Wilkinson; Petar Jovanovic
As enterprises become more automated, real-time, and data-driven, they need to integrate new data sources and specialized processing engines. The traditional business intelligence architecture of Extract-Transform-Load (ETL) flows, followed by querying, reporting, and analytic operations, is being generalized to analytic data flows that utilize a variety of data types and operations. These complicated flows are difficult to design, implement and maintain since they span a variety of systems. Additionally, new design requirements may be imposed such as design for fault-tolerance, freshness, maintainability, sampling, etc. To reduce development time and maintenance costs, automation is needed. We present xPAD, our platform to manage analytic data flows. xPAD enables flow design. We show how these designs can be optimized, not just for performance, but for other objectives as well. xPAD is engine-agnostic. We show how it can generate executable code for a number of execution engines. It can also import existing flows from other engines and optimize those flows. In that way, it can transform a flow written for one engine into an optimized flow for a different engine. In our demonstration, we will also use various example flows to show optimization for different objectives and comparison of flow execution on different engines.