Archive | 2019
Seleção de atributos de dados inconsistentes em ambiente HDF5+Python na cloud INCD
Abstract
The treatment of large datasets is an issue that is often addressed today and whose task is not simple, given the computational limitations that still exist. One possible approach is to perform a feature selection that allows a considerably reduction of data size without increasing inconsistency. Logical Analysis of Inconsistent Data (LAID) is a systematic, robust methodology that is easy to interpret and can handle inconsistent data. The paradigm regarding the handling of large data has has been changing over. Previously, data processing was performed on a single computer, with in-memory data access. The current trend is to access data on disk, in a cloud environment. The present work intends to validate this new paradigm, using HDF5 data system and remote environment provided by INCD. Because HDF5 is the system adopted by Python’s community to handle large datasets, this language was chosen for LAID algorithm implementation.