Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies | 2019

Managing Open Data Evolution through Bi-dimensional Mappings

Abstract

The availability of large Open Data sources creates opportunities for data analytics on different domains. But in order to be effectively used, the data needs to be correctly extracted, formatted and integrated, which is a specially challenging task on Open Data sources, since there is usually less rigour in standardizing subsequent data releases. This means Open Data evolution must be handled. A domain specific solution, taking stock of existing approaches, but with delimited kinds of operations and mappings, would be useful for providing coarse-grained management of data evolution operations throughout time. In this paper, we present an Open Data Evolution managing solution, aiming to integrate periodically released data sets. We define a set of operations acting over the instances, schema and mappings, which are executed after each new data release. These operations rely on the existence of a time dimension in the input mappings. The approach is validated on a real-world case study, which is being currently used to integrate and access a large Brazilian educational Open Data source, with billions of records and hundreds of columns evolving over many years. The proposed solution is used to process this data source, successfully integrating more than 90 data releases from 2012 to 2018.

Volume None

Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies | 2019

Managing Open Data Evolution through Bi-dimensional Mappings

Abstract

Volume None

Pages None

DOI 10.1145/3365109.3368774

Language English

Journal Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies

Full Text