2019 IEEE 35th International Conference on Data Engineering (ICDE) | 2019
Towards Longitudinal Analytics on Social Media Data
Abstract
We are witnessing increasing interests in longitudinal analytics on social media data. Longitudinal analytics takes into account an interval and considers the temporal popularity of social media data in the interval, rather than only considering recently generated social media data in real-time search. We study a fundamental functionality in longitudinal analytics—the top-k temporal keyword (TkTK) querying. A TkTK query takes as input a set of query keywords and an interval, and returns the top-k most significant social items, e.g., tweets, where the significance of a social item is defined based on a combination of the textual relevance and temporal popularity. We model social media data as a forest of linkage trees along the time dimension, which well models the propagation processes, e.g., replies and forwards, among different social items. Based on the forest, we model the temporal popularity of a social item across time as a popularity time series. We design two indexing structures that index social items popularity time series and textual content in a holistic manner—the temporal popularity inverted index (TPII) and the log-structured merge octree (LSMO). Empirical studies with two substantial social media data sets offer insight into the design properties of the indexes and confirm that LSMO enables both efficient query processing and indexing structure updates.