Proc. VLDB Endow. | 2019

Selectivity Estimation for Range Predicates using Lightweight Models

 
 
 
 
 
 

Abstract


Query optimizers depend on selectivity estimates of query predicates to produce a good execution plan. When a query contains multiple predicates, today s optimizers use a variety of assumptions, such as independence between predicates, to estimate selectivity. While such techniques have the benefit of fast estimation and small memory footprint, they often incur large selectivity estimation errors. In this work, we reconsider selectivity estimation as a regression problem. We explore application of neural networks and tree-based ensembles to the important problem of selectivity estimation of multi-dimensional range predicates. While their straightforward application does not outperform even simple baselines, we propose two simple yet effective design choices, i.e., regression label transformation and feature engineering, motivated by the selectivity estimation context. Through extensive empirical evaluation across a variety of datasets, we show that the proposed models deliver both highly accurate estimates as well as fast estimation.

Volume 12
Pages 1044-1057
DOI 10.14778/3329772.3329780
Language English
Journal Proc. VLDB Endow.

Full Text