Proceedings of the 2019 International Conference on Management of Data | 2019
Arachnid: Generalized Visual Data Cleaning
Data cleaning is an inherently exploratory and visual process. Visualizations help analysts spot errors or surprising patterns/relationships that would otherwise go unnoticed. Once identified, the user will want to execute transformations in an attempt to correct these errors. However, alternating between the contexts of transforming and visualizing the data can be tedious. As a result, there is a need for data to be visualized and cleaned on the fly. Current solutions address this issue by allowing users to specify data cleaning transformations through a limited set of interactions that can directly manipulate visualizations pre-defined by the cleaning system itself. These visualizations are either generated as a table (OpenRefine, Wrangler, Microsoft Excel), or as bar graphs (Tableau Prep). By mapping a narrow set of mouse-based interactions to data cleaning specifications, these systems allow users to intuitively and quickly transform data within a constrained set of use cases. However, in order to gain a comprehensive visual understanding of the data and to quickly clean it for additional use cases, analysts often need to generate multiple visualizations from a set of common types and interactively execute data transformations beyond those supported by current cleaning software. Thus, we propose Arachnid as a novel system that builds upon existing work in generalized selection and direct manipulation to introduce a model for translating mouse-based interactions on common types of user-defined visualizations into an enhanced set of data cleaning transformations.