Archive | 2019

Seleção de atributos de dados inconsistentes em ambiente HDF5+Python na cloud INCD

 
 

Abstract


The treatment of large datasets is an issue that is often addressed today and whose task is not simple, given the computational limitations that still exist. One possible approach is to perform a feature selection that allows a considerably reduction of data size without increasing inconsistency. Logical Analysis of Inconsistent Data (LAID) is a systematic, robust methodology that is easy to interpret and can handle inconsistent data. The paradigm regarding the handling of large data has has been changing over. Previously, data processing was performed on a single computer, with in-memory data access. The current trend is to access data on disk, in a cloud environment. The present work intends to validate this new paradigm, using HDF5 data system and remote environment provided by INCD. Because HDF5 is the system adopted by Python’s community to handle large datasets, this language was chosen for LAID algorithm implementation.

Volume 14
Pages 85-112
DOI 10.34627/RCC.V14I0.184
Language English
Journal None

Full Text