Harbinger: An Analyzing and Predicting System for Online Social Network Users' Behavior
Rui Guo, Hongzhi Wang, Lucheng Zhong, Jianzhong Li, Hong Gao
aa r X i v : . [ c s . S I] D ec Harbinger:An Analyzing and Predicting System for OnlineSocial Network Users’ Behavior
Rui Guo, Hongzhi Wang, Lucheng Zhong, Jianzhong Li, and Hong Gao
Harbin Institute of TechnologyHarbin, Heilongjiang, China {ruiguo,wangzh,zlc,lijzh,honggao}@hit.edu.cn
Abstract.
Online Social Network (OSN) is one of the hottest innova-tions in the past years, and the active users are more than a billion. ForOSN, users’ behavior is one of the important factors to study. This demon-stration proposal presents
Harbinger , an analyzing and predicting systemfor OSN users’ behavior. In
Harbinger , we focus on tweets’ timestamps(when users post or share messages), visualize users’ post behavior as wellas message retweet number and build adjustable models to predict users’behavior. Predictions of users’ behavior can be performed with the discov-ered behavior models and the results can be applied to many applicationssuch as tweet crawler and advertisement.
Keywords:
Social Network, User Behavior, Message Timestamp
Online social networks have exploded incredibly in the past years. For instance,Twitter has 200 million active users who post an average of 400 million tweetsevery day [ ? ]. Since the large group of users make OSNs valuable for both com-mercial and academical applications, the understanding of users’ behaviors couldhelp these applications to improve efficiency and effectiveness.The understanding of users’ behaviors brings challenges. The crucial one is thatthey keep on challenging computing resource. For billions of active OSN users,various models or parameters are required to describe users’ different behaviors.Another difficulty is that users are influenced by many factors, most of which areinvisible through OSNs. It makes users’ behavior difficult to predict.Existing works study OSN users’ behavior in several different ways. [ ? ] char-acterizes behavior by clickstream data. They summarized HTTP sessions froman aggregation website. Its conclusion is that browsing counts 92% of all users’activities, but this observation cannot be obtained from public OSN data. [ ? ]downloaded user profile pages, and modeled users’ online time with Weibull dis-tributions.To analyze and predict users’ behavior, we present Harbinger system.
Harbinger has two major functions: users’ post behavior analyzing and single message retweet number analyzing. Statistics methods in [ ? ] are applied to avoid the effect of invis-ible factors. We observe both a group of users and single tweets, collect messagetimestamps and message retweet numbers, visualize users’ post behavior and thevariation of message retweet number, and describe them by Gaussian MixtureModel and Logarithm Model. Unlike previous works, Harbinger analyzes users’post behavior through tweets’ timestamps rather than users’ clickstream, onlinetime or friendship.The remainder of the demonstration proposal is organized as follows. In Sec-tion 2, we overview
Harbinger and introduce system architecture. In Section 3, wepresent mathematics models and algorithms. In Section 4, we give demonstrationscenario of the system.
Our System contains two major functions: the users’ post behavior analyzingfunction and the single message retweet number analyzing function.For the OSN users’ post behavior, the user of
Harbinger is expected to choosean analyzing function, a target OSN user and an analyzing pattern (daily, weeklyor monthly pattern). Then corresponding data are selected and statistics are pre-formed. We set the statistics time span to be one hour (from 00:00 to 24:00) to thedaily pattern, and one day from the beginning to the end of the week or month tothe weekly or monthly pattern. Finally, the figure of message number and time,and the analysis results such as Figure 2(a) will be plotted.For the retweet number function, a user chooses a tweet function and targettweet rather. The system analyzes the data and plots the relationship betweentime and the retweets of the selected tweet such as Figure 2(b).
Fig. 1.
Modules of
Harbinger
As shown in Figure 1,
Harbinger has five major modules: crawl, storage, in-terface, analyzer and figure drawer modules.In the crawl module, we develop an OSN crawler through OSN official API tocollect information and this is also an application for our prediction model. Thecrawler collects the message information, including content and timestamp, andthen stores it in the storage module, where all data are stored in a database. TheUser Interface (UI) of the Interface Module connects visitors to
Harbinger andother modules. The user of
Harbinger can select target OSN user or tweet, andthe analyzing pattern in the UI module. The selection is sent to the analyzer.After statistics, analysis and calculating in the analyzer are based on the modelsin Section 3, and the results are sent to the figure drawer, which draws figures inthe UI according to the analysis results.
In this section, we describe the models and algorithms used in our system. Theyare the major parts of the analyzer. Based on the technology in [ ? ], we develop theGaussian Mixture Model (GMM) to describe OSN user behavior and LogarithmModel to illustrate the relationship between retweet number and time. Figure 2shows the results of the GMM and Logarithm Model. (a) Figure of Gaussian Mixture Model (b) Figure of Logarithm Model Fig. 2.
Figure of retweet number and time
Gaussian Mixture Model
In the daily pattern of the users’ post behavior, we find that the relationshipbetween new messages’ number and the time in a day follows the addition of twoGaussian Distributions (or Normal Distributions). OSN users often work duringthe day, and rest at noon and dusk. Thus there are two peaks of fresh OSNmessages. The curve around each peak is similar to a Gaussian Distribution. Asthe result, the figure can be treated as a mixture of two Gaussian Distributions.Thus we develop Gaussian Mixture Model [ ? ] (GMM) to compute unknownparameters of the figure. Assume the daily time is t , the number of new messagesduring t is f ( t ), the two Gaussian Distributions are N ( µ , σ ) and N ( µ , σ ),and there is f ( t ) = 1 √ πσ e − ( x − µ σ + 1 √ πσ e − ( x − µ σ To figure out the exact parameters ( µ , σ , µ , σ ) in GMM, we apply Expectation-Maximization (EM) algorithm [ ? ], which is the computing process of GMM. Fig-ure 2(a) shows the results of GMM. In Figure 2(a), the solid line means the originalfigure of time and retweet number (the user post frequency in a day), and thedotted line means the results of EM algorithm. The results show that the postfrequency is indeed similar to the sum of two Gaussian distributions. Logarithm Model
We find that the relation between retweet number and posted time of a specificmessage follows the logarithm function. After stretching and shifting, a basiclogarithm function can describe retweet number properly. In this curve, x-axis is the posted time (e.g. how long the tweet is posted) of the message and y-axis isthe retweet number.The retweet number grows very fast after the tweet is posted, and with xincreases, the growing becomes more and more unchanged. Thus we compute theretweet number RN in the posted time t (e.g. how long the message is posted)as RN = k log base ( k x + k ) + k , where base is the base of logarithm functionrepresenting the steepness of the curve. k is the x-axis stretch parameter, k isthe y-axis stretch parameter, k is the x-axis shift parameter, and k is the y-axisshift parameter.To compute the exact parameters of Logarithm Model, we apply Least SquaresAlgorithm [ ? ]. The basic idea of Least Squares algorithm is to approximate themodel by a linear function and to refine the parameters with iterations.Figure 2(b) describes the result of Logarithm Model. The solid line meansthe original figure of time and retweet number growth (how the retweet numberchanges over time after being posted), and the dotted line means the result ofLogarithm Model. The result shows that the original figure is indeed similar to alogarithm function figure. Harbinger