Lecture Notes in Social Networks | 2021
Analysis of Link Prediction Algorithms in Hashtag Graphs
Abstract
Twitter is a prominent multilingual social networking site where users can post messages known as “tweets”. Twitter, like other social networking sites such as Facebook, allows users to categorize tweets by the use of “hashtags”. Communication on Twitter can be mapped in terms of hashtag graphs, where vertices correspond to hashtags, and edges correspond to co-occurrences of hashtags within the same distinct tweet. Furthermore, a vertex in hashtag graphs can be weighted with the number of tweets a hashtag has occurred in, and edges can be weighted with the number of tweets both hashtags have co-occurred in, creating a “weighted hashtag graph”. In this chapter, we describe additions to some well-known link prediction methods that allow the weights of both vertices and edges in a weighted hashtag graph to be taken into account. We base our novel predictive additions on the assumption that more popular hashtags have a higher probability to appear with other hashtags in the future. We then apply these improved methods to three sets of Twitter data with the intent of predicting hashtag co-occurrences in the future. In addition to these methods, we investigate the performance of a new, graph neural network-based framework, SEAL, which has been shown in past trials to perform better than heuristic-based approaches such as the Katz index, SimRank and rooted PageRank. Experiments were conducted on real-life data sets consisting of over 3,000,000 combined unique tweets and over 250,000 combined unique hashtags. Results from the experiments show that simpler heuristic-based scoring methods have marginal performance that decreases with the addition of more data over time. On the other hand, SEAL is shown to have superior performance in hashtag graph link prediction over the approaches it has been previously compared against in other domains. The AUC score of 0.959 obtained in our experiments by using SEAL significantly exceeds those of our benchmark approaches for link prediction, which include the Katz index, SimRank, and rooted PageRank.