[PDF] Human Mobility and Predictability enriched by Social Phenomena Information

Abstract

The massive amounts of geolocation data collected from mobile phone records has sparked an ongoing effort to understand and predict the mobility patterns of human beings. In this work, we study the extent to which social phenomena are reflected in mobile phone data, focusing in particular in the cases of urban commute and major sports events. We illustrate how these events are reflected in the data, and show how information about the events can be used to improve predictability in a simple model for a mobile phone user's location.

Full PDF

HHuman Mobility and Predictability enriched bySocial Phenomena Information

Nicolas B. Ponieman

Grandata Labs, Argentina [email protected]

Alejo Salles

Physics Dept., UBA, Argentina [email protected]

Carlos Sarraute

Grandata Labs, Argentina [email protected]

Abstract —The massive amounts of geolocation data collectedfrom mobile phone records has sparked an ongoing effort tounderstand and predict the mobility patterns of human beings.In this work, we study the extent to which social phenomena arereﬂected in mobile phone data, focusing in particular in the casesof urban commute and major sports events. We illustrate howthese events are reﬂected in the data, and show how informationabout the events can be used to improve predictability in a simplemodel for a mobile phone user’s location.

I. I

NTRODUCTION

Mobile phone operators have access to an unprecedentedvolume of information about users’ real-world activities. Therecords of calls and messages exchanged between their usersprovides a deep insight into the interactions and activities ofmillions of individuals. The social graph induced by mobilecommunications has provided a rich ﬁeld to apply socialnetwork analysis to real-world problems. For instance, wecan highlight the use of community detection techniques (see[1][2][3][4]); and the more recent advances in detecting theevolution of communities in dynamic networks (taking intoaccount the evolution of the social graph over time) in [5][6].A key aspect of the data collected by mobile phone oper-ators that has attracted considerable attention in recent yearsis the information about how people are moving in the realworld. In fact, mobile phone records can be considered asthe most detailed information on human mobility across alarge part of the population [7]. The study of the dynamicsof human mobility using the collected geolocations of users,and applying it to predict future users’ locations, has been anactive ﬁeld of research [8][9]. In particular, this informationcan be used to validate human mobility models (as the authorsof [10] did with the information from a location-based socialnetworking site); and to study the interplay between individualmobility and social networks [11].The study of human mobility can be applied to domains asdiverse as city planning and trafﬁc engineering (e.g. to opti-mize the public transportation system and the roads network);public health (e.g. to allow health ofﬁcials to track and predictthe spread of contagious diseases); or to guide humanitarianrelief after a large-scale disaster (see [9] wherein the authorsstudy population movements after the Haiti 2010 earthquake).Using real-world data to understand human mobility is criticalto such applications. On the business side of applications,mobile carriers are seeking for new revenue streams based on the anonymized and aggregated analysis of their subscribers’mobility data [12].In this work, we study the extent to which social phenomenaare reﬂected in mobile phone data, focusing in particular inthe cases of urban commute and major sports events. Therest of the paper is organized as follows. Section II brieﬂydescribes the real-world data source that we used for ourexperiments. In Section III, we present a simple model topredict the location of a mobile phone user, that we usedas baseline of predictability. In Section IV, we illustrate howurban commute can be observed in the data, and compute basicmetrics. We also show the mobility pattern associated with asports event (namely a soccer match). In Section V, we showhow information about social events can be used to improvethe predictability of the simple model. We illustrate this ideain the case of soccer matches, and use the information ofthe soccer ﬁxture to improve location predictions. Section VIconcludes the paper, and discusses ideas for future work.II. M

OBILE D ATA S OURCE

Sunday Monday Tuesday WednesdayThursday Friday Saturday01234567891011121314151617181920212223

Fig. 1. Call distribution according to the day of the week and the hour,averaged over a period of ﬁve months, in Argentina. The contrast betweenthe weekend and workweek is evident. It is also interesting to observe thecommunications peaks during the morning and the afternoon from Mondaythrough Friday. Most public holidays in the period studied where on Mondays,which is clearly visible in the ﬁgure. a r X i v : . [ phy s i c s . s o c - ph ] N ov ur data source is anonymized trafﬁc information from amobile operator in Argentina, focusing mostly in the BuenosAires metropolitan area, over a period of 5 months. The rawdata logs contain around 50 million calls per day. Call DetailRecords (CDR) are an attractive source of location informationsince they are collected for all active cellular users (about 40million users in Argentina), and creating additional uses ofCDR data incur little marginal cost.For our purposes, each record is represented as a tuple (cid:104) x, y, t, d, l (cid:105) , where user x is the caller, user y is the callee, t is the date and time of the call, d is the direction of the call(incoming or outgoing, with respect to the mobile operatorclient), and l is the location of the tower that routed thecommunication. The temporal granularity used in this studyis the hour, justiﬁed by the ﬁndings in [7][13].From the operator’s data, it is possible to have directinformation on mobile phone usage patterns, as can be seenin Figure 1, which shows the volume of communicationsaccording to the day of the week and the hour. The expectedcontrast between weekend and workweek is evident. Moreinteresting information is given by the communications peaksduring the morning (around 11 a.m.) and the afternoon (around18 p.m.) from Monday through Friday, which depend onthe working habits in Argentina. Most of public holidays inthe period studied where on Mondays, and this fact showsperfectly well in Figure 1, if we assume during holidays peopleshow a similar calling pattern as during weekends.III. M OBILITY M ODEL

To predict a user’s position, we use a simple model basedon previous most frequent locations. We compute the correctprediction probability (i.e. accuracy) as the ratio between thenumber of correct predictions and total predictions made. Inorder to compute these locations, we split the week in timeslots, one for each hour, totalizing ∗

24 = 168 slots perweek. Since humans tend to have very predictable mobilitypatterns [7][14][15], this simple model turns out to give a goodpredictability baseline, achieving an average of around correct predictions for a period of 2 weeks, training with 15weeks of data, including peaks of over predictability. Thismodel was used as a baseline in [16], with which our resultsagree. In Figure 2 we show the average predictability for alltime slots (considering the week from Sunday to Saturday).Although the kind of periodic behaviour observed in theﬁgure is widely explained in the literature, it is important tomake a few remarks about the results obtained: • It is clear that predictability is at least higher duringweekdays (Monday - Friday) than during the weekend. • During the night, people have a peak of predictability,corresponding to the time they typically spend at home. • Predictability is slightly higher when computed fromoutgoing calls than when computed from incoming calls. • We expect correct prediction probability to improve, andits curve to be smoother, if we ﬁlter users and makepredictions only among the ones with high number of C o rr e c t p r e d i c t i o n p r o b a b ili t y Predictability for each time slotOutgoing callsIncoming callsAll calls(I+O)

Fig. 2. Users’ location predictability by time slot. Blue: Outgoing calls. Red:Incoming calls. Green: All calls. We considered the week starting at Sunday,so the ﬁrst time slot corresponds to Sunday from midnight to 1 a.m., whereastime slot 168 corresponds to Saturday from 11 p.m to midnight. communications. This analysis will be performed in thenear future.It is important to notice that it is possible to improve theaccuracy of this simple model by clustering antennas as weare considering each antenna as a different location. Althoughthis seems to be a reasonable choice, real life situations donot adjust perfectly to this schema. While a user is at herhouse, she might be using more than one antenna, and weare considering that she is in two different places. On theother hand, a user might use the same antenna while sheis in different locations, like her workplace and the nearbyrestaurant where she eats lunch. Some of these problems arepathological, and can not be tackled due to the poor spaceresolution given by antennas (as contrasted, for example, withGPS information). However, several problems can be solvedby clustering antennas, where those clusters would representreal locations for users.IV. M

OBILITY P ATTERNS

A. Urban Commute

The phenomenon of commuting is prevalent in largemetropolitan areas (often provoking upsetting trafﬁc jams andincidents), and naturally appears in mobile phone data. Forinstance, in [17] the authors study commute distances inLos Angeles and New York areas. Mobile data can lead toquantiﬁcation of this phenomenon in terms of useful quantities,which are much harder to measure directly. Figure 3 showsthe call distribution for each antenna in the area of interest,averaged over a whole month. We include a series of callpatterns illustrating the Buenos Aires commute in Figure 4. ig. 3. Antenna call distribution in Buenos Aires city and its surroundings(the Greater Buenos Aires). Note that the color scale is different than the oneused in Figure 4.

Red color corresponds to a higher number of calls, whereasblue corresponds to an intermediate number of calls and lightblue to a smaller one.From the data, we can estimate the radius of the commute(ROC - the average distance travelled by commuters). To doso, we take into consideration the two most frequently usedantennas as the important places for each user (home and work,see [18]). We proceed ﬁrst to deﬁne a night time, where usersare usually at their houses (9 p.m. - 5 a.m. during weekdays)and a day time where users are usually at their work places(12 p.m. - 4 p.m. during weekdays). Afterwards, we counthow many times each user makes calls in each of those twotime spans both from inside and outside Buenos Aires city.To simplify computations, we take a square area roughlycorresponding to the country capital (the autonomous city ofBuenos Aires), which is separated by a political boundary fromthe rest of the large metropolitan area (the Greater BuenosAires). Around 3 million people live in Buenos Aires city,whereas around 13 million people live in the Greater BuenosAires, which is among the top 20 largest agglomerations ofthe world by population.A large part of the individuals working in Buenos Airescity live in the Greater Buenos Aires, and commute everyday. Surveys and estimations state that more than 3 millionpeople commute to Buenos Aires city every day. Thereby, wedeﬁne a user as a commuter if she makes most of her nighttime calls from outside the city and most of her day callsfrom inside the city. To perform the experiment, we chose athreshold τ = 80% , meaning that at least τ of a user’s nighttime calls must be made from outside the city and at least τ of day time calls must be made from inside the city in orderto be considered a commuter.After deﬁning commuters, their home and work locationshave to be found in order to compute the radius of commute.We deﬁne their home to be the antenna with the highestnumber of communications from outside the city during nighttime, and analogously deﬁne their work to be the antenna withhighest number of communications inside the city during daytime. We consequently consider the distance between thosetwo main antennas as the ROC for each user. Having made the preceding deﬁnitions and assumptions, wecompute an average ROC of . km (as a comparison, thediameter of the city is about 14 km, and the diameter of theconsidered metropolitan area is 90 km).We also computed a random ROC assuming users’ locationswere randomly distributed in the region of interest and, oncemore, deﬁning as commuters users that live outside the citybut work inside. The result was a randomized average ROCof . km.The previous result conﬁrms an intuitive idea: people’sliving and working places tend to be closer than what a randomdistribution would predict. B. Sports Events

As in the urban commute case, we study human mobility insports events as seen through mobile phone data. In Figure 5,we show how assistants to a Boca Juniors soccer matchconverge to the stadium in the hours prior to the game, anddisperse afterwards. Average attendance to Boca Juniors homematches is 42000 people.Note that postselecting the users attending the event neces-sarily produces the effect of having no calls outside the chosenarea during the match. However, the convergence patternobserved is markedly different from the one seen for the sametime slot of the week on a day with no match, as shown inFigure 6.V. I

MPROVING P REDICTABILITY WITH E XTERNAL D ATA

So far, our results allow us to understand (and quantify)social events through the analysis of mobile phone data. Thisunderstanding can be in turn used to improve the mobilitymodel. Social relations among individuals have been used toimprove predictability in mobility models before, as in [16],where social links learned from the mobile phone records areused to this end. Here, instead of peer to peer links learnedfrom the mobile data, we show how an external data sourcecan be used to improve the model.We illustrate this effect using as proof of concept the casestudy of soccer matches. By taking the soccer ﬁxture, we tagusers as “Boca Juniors fans” if they make calls using antennasaround the stadium and during the time slots of Boca matchesfor three selected consecutive matches (which include bothhome and away matches), which can be considered as part ofour training set for the new approach. Using this tagging, wecan dramatically improve predictability for this group of Bocafans, even predicting locations that had never been visited bya user before, 1000 km away from her usual location.In the basic model, we predicted a user’s location in aparticular time slot to be her most frequent location in thatparticular time slot in the training set, whereas in this enrichedmodel, we predict the stadium location (as a cluster of theantennas surrounding it) in case the user is a Boca Juniors fanand we are making predictions on a day where Boca playsa match on that stadium. To evaluate the basic model, weuse 15 weeks of data for training purposes, as described insection III. For the enriched model, we use the same traininga) 6 a.m. (b) 8 a.m. (c) 10 a.m.(d) 2 p.m. (e) 5 p.m. (f) 6 p.m.(g) 7 p.m. (h) 8 p.m. (i) 10 p.m.

Fig. 4. Commute to Buenos Aires city from the surrounding areas on a weekday, for different hours. The color scale can be seen in image (a) wherenumbers represent estimated number of people in a circle from the corresponding color. Image (g) – corresponding to 7 p.m. – clearly shows the major roadsand highways connecting the city center to the North, West and South suburbs. data, adding the previously mentioned social information (i.e.tagging users) on three consecutive Boca matches in thatperiod. The evaluation is made on the same testing data set inboth cases, consisting on the three days where Boca plays thenext matches.The predictability of the model for these tagged usersconsidering the ﬁxture data rises for the days where thereis a Boca Juniors match to – which doubles the accuracy achieved by our previous model for the same set.Moreover, the initial model is only able to make predictionsin of events in the given set (as a consequence of alack of information from the training set data), whereas thesocially enriched model tries to predict of the eventsduring match days, which make the previous results even moresigniﬁcant.In order to understand these results, we illustrate with a few examples where the enriched model outperforms the simplemodel: • The simple model would rarely predict a user’s locationon a different city or in an unvisited location, whereasthe enriched model would do so if the user is a Bocafan, and Boca has an away match in that city. • If Boca usually plays matches on Sundays at 7 p.m.a Boca fan could have a stadium antenna as its mostfrequent antenna for that particular time slot. Conse-quently, the simple model would predict her to be inthat location for any other Sunday. However, the enrichedmodel wouldn’t do so for away matches, and can eventake into account that season is over, and therefore predictthe location of the following most frequent antenna.a) 5 hours before (b) 2 hours before (c) 1 hour before(d) 1 hour after (e) 2 hours after (f) 3 hours after

Fig. 5. Convergence to Boca Juniors stadium on hours prior to a soccer match, and dispersal after its end. The color scale can be seen in image (a) wherenumbers represent estimated number of people in a circle from the corresponding color. (a) 5 hours before (b) 1 hour before (c) 1 hour after (d) 3 hours after

Fig. 6. Similar to ﬁgure 5 on a day with no Boca match.

VI. C

ONCLUSION AND F UTURE W ORK

We illustrated how social phenomena can be studied throughthe lens of mobile phone data, which can be used to quantifydifferent aspects of these phenomena with great practicality.Furthermore, we showed how including external informationabout these phenomena can improve the predictability ofhuman mobility models.Although we showed this in a speciﬁc case as a proofof concept experiment, we note that this procedure can beextended to other settings, not restricted to sports but includingcultural events, vacation patterns and so on (see [9] for aspecially relevant application). The tagging obtained is usefulon its own and is of great value for mobile phone operators.The big challenge in this line of work is to manage to include external data sources in a systematic way.The results obtained, as well as interesting ideas and ques-tions related with this subject that were not addressed here,give a great perspective on future work. A simple approachto improve these metrics is to cluster antennas in order toconsider clusters as locations, which is much more real thanconsidering a single antenna as a location. The next step wouldbe to keep working on more complex location predictionmodels, specially using social information, in order to improvethe obtained accuracy and predictability rate.Moreover, an important goal is to manage to includeexternal data sources in a systematic way, trying to workwith more and possibly bigger communities other than the”Boca Juniors fans” community, and analyzing how socialinformation modiﬁes the predictions made by models duringonger periods.The communities idea shows a complementary way to usesocial information in order to improve the models’ predictions,by taking tag-based predictions to the community level. Deﬁn-ing, for instance, the “Boca Juniors fans” community, we canpredict that if some users of this community make or receivecalls in a certain location, other users in the community willdo it as well. A

CKNOWLEDGMENTS

The authors would like to thank Matias Travizano andMartin Minnoni for their ideas and suggestions, and theanonymous reviewers for their feedback.R

EFERENCES[1] Mark EJ Newman and Michelle Girvan. Finding and evaluatingcommunity structure in networks.

Physical review E , 69(2):026113,2004.[2] Vincent Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and EtienneLefebvre. Fast unfolding of communities in large networks.

Journal ofStatistical Mechanics: Theory and Experiment , 2008(10):P10008, 2008.[3] Santo Fortunato. Community detection in graphs.

Physics Reports ,486(3):75–174, 2010.[4] Qinna Wang and Eric Fleury. Community detection with fuzzy com-munity structure. In

Advances in Social Networks Analysis and Mining(ASONAM), 2011 International Conference on , pages 575–580. IEEE,2011.[5] Thomas Aynaud and Jean-Loup Guillaume. Static community detectionalgorithms for evolving networks. In

WiOpt’10 , pages 513–519. IEEE,2010.[6] Carlos Sarraute and Gervasio Calderon. Evolution of communities withfocus on stability. In

Third International Conference on the Analysis ofMobile Phone Datasets (NetMob) , 2013. [7] Chaoming Song, Zehui Qu, Nicholas Blumm, and Albert-L´aszl´oBarab´asi. Limits of predictability in human mobility.

Science ,327(5968):1018–1021, 2010.[8] Manlio De Domenico, Antonio Lima, and Mirco Musolesi. Interde-pendence and predictability of human mobility and social interactions.

Nokia Mobile Data Challenge Workshop , 2012.[9] Xin Lu, Linus Bengtsson, and Petter Holme. Predictability of populationdisplacement after the 2010 haiti earthquake.

Proceedings of theNational Academy of Sciences , 109(29):11576–11581, 2012.[10] Tommy Nguyen and Boleslaw K Szymanski. Using location-basedsocial networks to validate human mobility and relationships models.In

Advances in Social Networks Analysis and Mining (ASONAM), 2012IEEE/ACM International Conference on , pages 1215–1221. IEEE, 2012.[11] Dashun Wang, Dino Pedreschi, Chaoming Song, Fosca Giannotti, andAlbert-Laszlo Barabasi. Human mobility, social ties, and link prediction.In

Proceedings of the 17th ACM SIGKDD international conference onKnowledge discovery and data mining , pages 1100–1108. ACM, 2011.[12] Jessica Leber. How wireless carriers are monetizing your movements.

MIT Technology Review , 2013.[13] Chaoming Song, Tal Koren, Pu Wang, and Albert-L´aszl´o Barab´asi.Modelling the scaling properties of human mobility.

Nature Physics ,6(10):818–823, 2010.[14] Marta C Gonzalez, Cesar A Hidalgo, and Albert-Laszlo Barabasi. Un-derstanding individual human mobility patterns.

Nature , 453(7196):779–782, 2008.[15] Shan Jiang, Joseph Ferreira, and Marta C Gonz´alez. Clustering dailypatterns of human activities in the city.

Data Mining and KnowledgeDiscovery , pages 1–33, 2012.[16] Eunjoon Cho, Seth A Myers, and Jure Leskovec. Friendship andmobility: user movement in location-based social networks. In

ACMSIGKDD , pages 1082–1090. ACM, 2011.[17] Sibren Isaacman, Richard Becker, Ram´on C´aceres, Stephen Kobourov,Margaret Martonosi, James Rowland, and Alexander Varshavsky. Iden-tifying important places in people’s lives from cellular network data.

Pervasive Computing , pages 133–151, 2011.[18] Bal´azs Cs Cs´aji, Arnaud Browet, VA Traag, Jean-Charles Delvenne,Etienne Huens, Paul Van Dooren, Zbigniew Smoreda, and Vincent DBlondel. Exploring the mobility of mobile phone users.