2019 IEEE 35th International Conference on Data Engineering (ICDE) | 2019
A Hierarchical Framework for Top-k Location-Aware Error-Tolerant Keyword Search
Location-aware services have become widely available on a variety of devices. The resulting fusion of spatio-textual data enables the kind of top-k query that takes into account both location proximity and text relevance. Considering both the misspellings in user input and the data quality issues of spatiotextual databases, it is necessary to support error-tolerant spatial keyword search for end-users. Existing studies mainly focused on set-based textual relevance, but they cannot find reasonable results when the input tokens are not exactly matched with those from records in the database. In this paper, we propose a novel framework to solve the problem of top-k location-aware similarity search with fuzzy token matching. We propose a hierarchical index HGR-Tree to capture signatures of both spatial and textual relevance. Based on such an index structure, we devise a best-first search algorithm to preferentially access nodes of HGR-Tree with more similar objects while those with dissimilar ones can be pruned. We further devise an incremental search strategy to reduce the overhead brought by supporting fuzzy token matching. Experimental results on real world POI datasets show that our framework outperforms state-of-the-art methods by one to two orders of magnitude.