When and where do you want to hide? Recommendation of location privacy preferences with local differential privacy
aa r X i v : . [ c s . D B ] A p r “When and where do you want to hide?”– Recommendation of location privacypreferences with local differential privacy Maho ASADA, Masatoshi YOSHIKAWA, and Yang CAO
Kyoto University, Kyoto, Japan [email protected] { yoshikawa, yang } @i.kyoto-u.ac.jp Abstract.
In recent years, it has become easy to obtain location infor-mation quite precisely. However, the acquisition of such information hasrisks such as individual identification and leakage of sensitive informa-tion, so it is necessary to protect the privacy of location information.For this purpose, people should know their location privacy preferences,that is, whether or not he/she can release location information at eachplace and time. However, it is not easy for each user to make such de-cisions and it is troublesome to set the privacy preference at each time.Therefore, we propose a method to recommend location privacy prefer-ences for decision making. Comparing to existing method, our methodcan improve the accuracy of recommendation by using matrix factoriza-tion and preserve privacy strictly by local differential privacy, whereasthe existing method does not achieve formal privacy guarantee. In ad-dition, we found the best granularity of a location privacy preference,that is, how to express the information in location privacy protection.To evaluate and verify the utility of our method, we have integrated twoexisting datasets to create a rich information in term of user number.From the results of the evaluation using this dataset, we confirmed thatour method can predict location privacy preferences accurately and thatit provides a suitable method to define the location privacy preference.
Keywords:
Privacy preference · Location data · Matrix factorization · Local differential privacy.
In recent years, due to the popularization of smartphones and the development ofGPS positioning equipment, location information for people has been able to beobtained quite precisely and easily. Such data can be utilized in various fields suchas marketing and urban planning. In addition, there are many applications thatdo not function effectively without location information [33] [22] [33]. Becauseof such value, market maintenance to buy and sell it has started. [1] [22].However, on the other hand, by publishing accurate location information,there are privacy risks associated with such as individuals being identified [13].
M. Asada et al.
Due to such risks, privacy awareness regarding location information among peo-ple is very high [24]. One of the most risky situations is when smartphones areused. This is because we are sending location information to them when usingmany applications [16].In order to prevent privacy risks under such circumstances, it is necessary toanonymize or obfuscate location information. One of the countermeasures is aprimitive one: turn off location information transmission manually when usinga smartphone. There is also a method of applying a location privacy protectiontechnique. Various techniques are used for protection, including k -anonymity[38] [19], differential privacy [14] [20], and encryption [41]. Fawaz [16] proposed asystem that applies these privacy protection technologies to smartphones. Thissystem controls the accuracy of the location information sent to each applica-tion. The user needs to input how accurate he/she wants to send the respectivelocation information to each application.In these countermeasures, to avoid a privacy risk by the disclosure of locationinformation, the user has to decide location privacy preference for each location,that is, whether or not he/she publishes the location data at a certain placeand time. However, we think that there is a problem in such a situation. Whatis the best privacy preference is unclear, and it may be different for each user.Therefore, individual users need to determine their location privacy preferences.However, most users find it difficult to determine these preferences themselves[34], and it is troublesome to set the privacy preference at each time. Of course,it is necessary to avoid sending all the position information to the third partysuch as application.Therefore, we need a system to recommend location privacy preferences fordecision support when choosing a user’s location information privacy preferenceand for the promotion of safe location information release. Recently, one suchsystem was developed using the concept of item recommendation, which is usedfor online shopping. Item recommendation regards the combination of locationand time as an item and whether or not to release location information as arating of the item, and it predicts the rating of an unknown item. Zhang [43]proposed a method of recommending by collaborative filtering.We focus on the problems of existing location privacy preference recommen-dation methods and propose a recommendation method to solve these problems.The contributions of this research are as follows.1. Clarifying the definition of location privacy preference:
In locationprivacy preference, how to define time and place has not been clarified sofar. For example, regarding time, “2 pm, February 1st, 2019” can be rep-resented as “a Friday afternoon of February”, “an afternoon”, and so on.Since granularity affects the usefulness of a recommendation, it is necessaryto confirm the best method of expression. Although many location privacyprotection methods have been proposed in the literature, none of them ad-dresses the problem of how to set location privacy preference. Therefore, wegenerate recommendation models using various granularities for time infor-mation and compare their usefulness. From these results, we find the best ecommendation of location privacy preferences with LDP 3 granularity that will produce trade-offs between the density of special datafor the recommendation and the consideration of time.2.
Applying matrix facrorization to location privacy preference rec-ommendation:
It is very important to recommend accurately, since loca-tion privacy preferences are very sensitive. Collaborative filtering, which wasused in the method by Zhang [43], experiences problems that when the num-ber of users and products increase; accurate prediction cannot be achieved,and only the nature of either the user or product can be considered well.Therefore, we propose a method to improve by utilizing matrix factoriza-tion. As a result of experiments, we confirm that we can predict accurateevaluation values with a probability of 90% for large amount of data.3.
Recommendation with location differential privacy:
Matrix factor-ization is involved in privacy risk, because each user needs to send their datato the recommendation system. Location privacy preferences are very sensi-tive because they encompass location information of the users at a certaintimes and whether the information is sensitive for him/her. The recommen-dation system obtaining such data may be on an untrusted server throughwhich an attacker may try to extract the user’s data [17]. However, a locationprivacy preference recommendation that achieves highly accurate privacyprotection has not been proposed so far. Actually, the method by Zhang [43]did not preserve privacy in a strict mathematical sense. Therefore, we pro-pose a recommendation method that realizes it with local differential privacy.In this technique, each user adds noise to their own data when transmittingdata, and the technique enables sending data to any system. We confirm thatour method maintain precision that is the same as that achieved a methodwithout privacy protection.4.
Generating a location information privacy preference dataset:
Achallenge in experiments for testing the performance of our methods is thatno appropriate location privacy preference dataset is available in literature.The only available real-world dataset of location privacy preference[31] hasfew users. Such data is not suitable for the evaluation of the method usingmatrix factorization [23]. In addition, we need bulk data in the evaluationbecause there are many users of recommendation in the real world. There-fore, we created an artificial dataset that combines such the location privacypreference dataset and a trajectory dataset with a large number of users.This paper is organized as follows: We describe the related works in Section2 and the knowledge necessary for realizing our goal in Section 3. Then, wedescribe our method in Section 4 and evaluate and discuss about our method inSection 5.
In order to protect the privacy of the location information, it is necessary to knowthe location privacy preferences, but this task is difficult. The recommendation
M. Asada et al.
Table 1: A comparison of our method with related work [43]
Method How to preserve privacyRelated work [43] Collaborative filtering(inaccurate for a large amount of data) Add noise that isnot strict mathematicallyOur method Matrix Factorization(accurate for a large amount of data) Add noise based onlocal differential privacy of location privacy preferences is useful in such conditions. There are two maintypes of recommendations: one that predicts preferences based on only one user’sdata [15] [34] and one that uses data of multiple users [39] [21].With the former type of recommendation, Sadeh [34] proposed a recommen-dation method that used machine learning such as the random forest algorithm.However, in a system using only one user’s data in prediction, when the userhas few data about his/her privacy preference, and the system cannot performaccurately [43]. Such a problem is very likely to occur, since it is difficult forusers to determine their own location privacy preferences themselves.A recommendation based on multiple users’ data can solve such a problem.There is a method that applies item recommendations used in online shopping tothe recommendation of location privacy preferences. This method considers thecombination of location and time as an item in the recommendation and predictsthe ratings of unknown items using collaborative filtering [32] [43]. However,there are three disadvantages to this method: the accuracy decreases as thenumber of users and products increases, only the nature of either the user orproduct can be considered well, and there is a possibility of threatening the user’sprivacy. Therefore, we propose a location privacy preference recommendationsystem that improves the utility and preserves privacy.
Item recommendation has privacy risks [6] [25] [29], so the recommendationmethods that aim to preserve privacy use differential privacy [4] [5] [27] [28].In these methods, the recommendation system adds noise after collecting users’ratings of items, and such a system is considered to be on a trusted server.However, there is no guarantee that all recommendation systems exist on atrusted server, and there are cases when privacy is threatened.To avoid such risks, local differential privacy is used, where the user addsnoise to his/her data before sending the data to the recommendation system. Thenumber of methods that apply local differential privacy to item recommendationis increasing [35] [36] [18]. The method by Shin [37] added both items and rating,improving the privacy preservation.However, there has never been a location privacy preference recommendationthat preserves privacy. Therefore, we propose such a location privacy preferencerecommendation, referring to the method by Shin [37]. A comparison of ourmethod with related work is shown in Table 1. ecommendation of location privacy preferences with LDP 5
In location privacy preservation, it is important to define the privacy goal, thatis, which information we protect. Such definition influences the recommendationof privacy preferences and the preservation of location privacy. There are waysto define a level that preserves location privacy regardless of location and time[3] [2] [42]. The idea is to preserve location privacy at the same level at alllocations and times. However, the users want to change the level according totheir location and time. Therefore, we should define where and when we wantto preserve location privacy, but there is no method for this definition.Cao et al. [9] [11] [12] proposed a method to define the privacy goal, whichconsisted of spatial and temporal goals. However, even in these studies, it wasnot mentioned about how to define information such as the place and time.Therefore, we propose a way to define such information in the best possible way.
Matrix factorization is one of the most popular methods used for item recom-mendation, which predicts the ratings of unknown items. This is an extension ofcollaborative filtering to improve the accuracy for the large amount of data bydimentionality reduction.We consider the situation in which m users rate any item in n items (e.g.,movies). We express each user’s rating of each item by M ⊂ { , · · · , m } ×{ , · · · , n } , and the number of ratings as M = |M| , for the user i ’s ratingof item j . Matrix factorization predicts the ratings of unknown items given { r ij : ( i, j ) ∈ M} . To make a prediction, we consider a ratings matrix R = m × n ,a user matrix U = d × m , and an item matrix V = d × n . The matrices satisfythe formula: R ≈ U T V The user i ’s rating of item j is obtained from the inner product of U Ti and V j . In matrix factorization, a user element is expressed by u i ∈ R d , 1 ≤ i ≤ m ,and an item element is expressed by v j ∈ R d , ≤ j ≤ n , which are learned fromknown ratings. In learning, we obtain the matrices U and V , which minimizethe following: 1 M X ( i,j ) ∈M ( r ij − u Ti v j ) + λ u m X i =1 || u i || + λ v n X j =1 || v j || (1) λ u and λ v are positive variables for regularization. U and V are obtained by updating using the following formulae: u ti = u t − i − γ t · {∇ u i φ ( U t − , V t − ) + 2 λ u U t − i } (2) v tj = v t − j − γ t · {∇ v j φ ( U t − , V t − ) + 2 λ v V t − j } (3) M. Asada et al. γ t is the learning rate at the t th iteration, and ∇ u i φ ( U, V ) and ∇ v j φ ( U, V ) arethe gradients of u i and v j . They are obtained from derivative of (1) and expressedby the followings: ∇ u i φ ( U, V ) = − M X j :( i,j ) ∈M v j ( r ij − u Ti v j ) (4) ∇ v j φ ( U, V ) = − M X i :( i,j ) ∈M u i ( r ij − u Ti v j ) (5)We predict the ratings of the unknown items by calculating U and V by theseformulae. We use local differential privacy to expand the matrix factorization into a formthat satisfies privacy preservation. This approach is an extension of differentialprivacy [14], which is a mechanism that protects the sensitive data in databasesfrom attackers but enables accurate statistical analysis of the entire database[30]. In differential privacy, a trusted server adds noise to the data collectedfrom the users. However, we assume that the server can not be trusted and uselocal differential privacy, in which users add noise to the data before sending thedata to the server.The idea behind realizing local differential privacy is that for a certain user,regardless of whether or not the user has certain data, the statistical result shouldnot change. The definition is given below:
Definition 1 (Local differential privacy)
We take x ∈ N x ′ ∈ N . A mecha-nism M satisfies ǫ -local differential privacy if M satisfies the following: P r [ M ( x ) ∈ S ] ≤ exp( ǫ ) P r [ M ( x ′ ) ∈ S ] ∀ S ⊆ Range ( M ) is any output that M may generate. A randomized response[40] is used to realize local differential privacy, which decides the value to outputbased on the specified probability when inputting a certain value. Each user canadd noise to the data according to their own privacy awareness, since he/she candecide the probability. The location privacy preference is defined by the following:
Definition 2 (Location privacy preference)
The location privacy preference p u ( t, p ) , in which the user u wants to hide location information at time t in lo-cation l , is expressed by the following: p u ( t, l ) = (cid:26) P ositive )0 (
N egative ) (6) ecommendation of location privacy preferences with LDP 7
P ositive ) means that he/she can publish location information, and N egative ) means that he/she does not want to publish location information. The time t is expressed as a slot of time divided by a certain standard. The divi-sion method varies depending on the reference time, but as an example, it can bedivided as follows: { M orning, N oon, Af ternoon, Evening, N ight, M idnight } Additionally, the location l is represented by a combination of geographic in-formation and a category of place or either of these. Geographical informationrepresents the latitude and longitude or certain fixed areas, and the categoryrepresents the property of a building located in that place such as a restaurantor a school.There are various granularity regarding how to represent this information.As for the time t , “February 1st, 2019” is expressed by “Afternoon”, “Week-day, Afternoon”, and “February, Weekday, Afternoon”. As for the location l ,“Charleston Airport (Address : 5500 International Blvd, Charleston, SC 29418,the US)” is expressed by “Airport” or “Airport, Charleston”. As the granularityof information changes, the number of goods in the recommendation and the de-gree of consideration of the nature of the commodity change, which influence theutility of the recommendation. However, it is not clear what kind of granularityis the best. Therefore, we confirm the best granularity, that is, the definition ofbest location information privacy preferences. We propose a location privacy preference recommendation method that preservesprivacy. When a user enters location privacy preferences for a certain number ofplaces and time combinations, the method outputs location privacy preferencesfor a combination of unknown places and times. We assume the recommendationsystem exists on an untrusted server and an attacker who tries to extract thelocation and time of the user visit and its rating based on the output of thesystem. Our method aims for compatibility between high availability as a rec-ommendation scheme and privacy protection. We realize the former by matrixfactorization and the latter by local differential privacy.First, we show a rough flow for the recommendation of the location privacypreference using the normal matrix factorization below.1. The recommendation system sends the user matrix U and item matrix V tothe user.2. The gradients are calculated by using the user’s data and step 1 and sent tothe system.3. The system updates U and V by the gradients calculated in step 2.This operation is performed a number of times, and there is a risk that theuser’s data is leaked to the attacker. M. Asada et al.
Fig. 1: Overview of our methodTherefore, we add noise to the data in step 2 above to avoid such risks.When the user calculate the gradients to update U and V , noise that satisfieslocal differential privacy is added to the information regarding “when and wherethe user visits” and “whether he/she wants to publish the location informa-tion”. The gradients calculated based on the noise-added data are sent to therecommendation system. We show the overview of our method in Fig. 1.If we repeat the above steps k times, the gradient should be different eachtime. Therefore, there is a risk that some information can be obtained by theattacker by comparing each gradient. To avoid such an attack, we propose amethod that satisfies ǫ -local differential privacy for all combination of the gra-dients. We preserve privacy for the information, that is, when and where the user vis-its and whether he/she wants to publish their location information. In privacyprotection process, we refer the method by Shin [37].First, we will describe how to add noise to the information regarding thetime and location of the user’s visits. Let y ij be a value indicating whether ornot user i has visited place j , which is 1 if he/she has visited the place and 0otherwise. The following equation holds: X ( i,j ) ∈M ( r il − u Ti v j ) = n X i =1 m X j =1 y ij ( r ij − u Ti v j ) Therefore, equation (5) can be transformed as follows. ∇ v j φ ( U, V ) = − n X i :( i,j ) ∈M y ij u i ( r ij − u Ti v j ) (7) ecommendation of location privacy preferences with LDP 9 To protect information regarding the time and location of the user’s visits fromprivacy attacks, we should add noise to a vector Y i = ( y ij ) ≤ j ≤ m representingwhether or not the user has evaluated the item. We use a randomized response,and the value y ∗ ij obtained by adding noise to y ij is obtained as follows. p u ( t, p ) = , with probability p/ , with probability p/ y ij , with probability − p (8)Next, we describe a method of privacy protection for information on whetherto disclose location information for a certain place and time combination. Weadd noise η ijl to the value g ij = ( g ijl ) ≤ l ≤ d = − u i ( r ij − u Ti v j ). The noise isbased on a Laplace distribution, and the noise-added value g ∗ ijl is expressed asfollows: g ∗ ijl = g ijl + η ijl (9)Each user adds noise to his/her own data in the above way, and the noise-added gradients, { ( y ∗ ij g ∗ ij , . . . g ∗ ijd ) : j = 1 , . . . , m } , are sent. The gradient usedfor updating the matrix V is the average value of the gradient of each user,expressed by ∇ v j = n − P ni =1 y ij g ij . Then, the noise added to this value isexpressed as follows. ∇ ∗ v j = 1 n n X i =1 ( y ∗ ij − p/ − p ) g ∗ ij (10)By repeating the operation k times, updating the value of the matrix usingthe slope calculated using the data with noise added, we find the matrix forpredicting the evaluation value. In this section, we describe the evaluation indices and points of view to considerwhen verifying the utilities of our method.We evaluate the following two points, the approximation between the trueratings value and the predicted one, and accuracy of the prediction. We describethe details of these metrics in 5.3.In the evaluations, we compare the utilities of the recommendation methodsusing normal matrix factorization and local differential privacy. In addition, weevaluate the method from the following three viewpoints:1. What is the best location privacy preference definition?2. How much impact does changes in the privacy preservation level make?3. What impact does changes in the number of unknown evaluation valuesmake?
Table 2: A measure representing true or false values of the result
True ratingPositive NegativePredicted rating Positive TP (True Positive) FP (False Positive)Negative FN (False Negative) TN (True Negative)
In the evaluation, we use artificial data combining the location privacy preferencedataset and the position information dataset.For the location privacy preference dataset, we use LocShare acquired fromthe data archive CRAWDAD [31]. This dataset was obtained from 20 users inLondon and St. Andrews over one week from April 23 to 29in 2011, with privacypreference data for 413 places. This dataset has few users, so it is not suitable forthe evaluation of our method using matrix factorization [23]. In addition, we needbulk data in the evaluation because there are many users of recommendation inthe real world.Therefore, we generated an artificial dataset by combining the location pri-vacy preference dataset with the trajectory dataset Gowalla, which was acquiredfrom the location information SNS in the US. This dataset includes check-in his-tories of various places from 319,063 users collected from November 2010 to June2011. The total number of check-ins is 36,001,959, and the number of checked-inplaces is 2,844,145.
We describe the metrics for verifying the utility of our method.
The approximation when comparing the entire matrix:
We measure thesimilarity of the two matrices, the matrix R representing the true evaluationvalue and the matrix U T V representing the estimated one, as a whole.Originally, the evaluation value is expressed by either 1 (Positive: can be re-leased) or 0 (Negative: cannot be released). However, the product of the matrix U T and V is not one of these but is classified as either 1 or 0 with a certainthreshold, which is the mean of the product elements. We measure how accu-rately the recommendation can predict the ratings. The predicted value in therecommendation can be classified based on the true evaluation value as in Table2. For example, TP (True Positive) is the number of data that are truly Positive(can be released) and whose predicted values are also Positive. We calculate thenumber of TP, FP, TN, and FN results from the prediction result and measurethe following two indices. – False Positive Rate: The false positive rate is the percentage of false positivespredicted relative to the number of negatives.
F P R = F PT N + F P ecommendation of location privacy preferences with LDP 11
Fig. 2: An example in which a part of the evaluation value of some users isregarded as unevaluated. In the evaluation, we measure how correctly the yellow-green part is predicted.This metric is an index for verifying whether the location information thatthe user wants to disclose is not erroneously disclosed. The lower the falsepositive rate, the higher the accuracy of the privacy protection. – Recall: Recall is the proportion of data predicted to be positive out of thedata that are actually positive.
Recall = T PT P + F N
Recall is an index for verifying whether the location information that theuser can publish is predicted to be positive, since if the released locationinformation decreases, the benefit decreases. The higher the recall value is,the higher the utility of the recommendation.
Accuracy of the prediction for each user:
In this experiment, as shown inFig. 2, a part of the products considered unevaluated is taken as learning datafor some users. We pay attention to the result of an item that is regarded asunevaluated in recommendation. In such cases, we measure the ratio of correctlypredicted evaluations relative to the true evaluation values. We call this ratiothe reconstruction rate, which can be calculated by the following:
Reconstruction rate = T P + T NT P + F P + T N + F N
The higher the reconstruction value, the more useful the recommendation.
We evaluate our method from the three viewpoints mentioned in 5.1, and weadopt each of the following methods.
1. We used 10-fold cross validation, that divides the users into training dataand test data , in which 90% of the users are regarded as training data and10% of the users are regarded as test data.2. Among the user’s data included in the test data, we regard the known evalu-ation value as unknown according to the values of the
U nknown Rate men-tioned later.3. We predict the evaluation value by using the test data and the training datawhich have undergone conversion processing and calculate the metrics.4. We repeat the above process 100 times and verify the average of the evalu-ation indices.In the evaluation, we change the value of one of the following parameters: time , ǫ , U nknown Rate . time is the length of the standard when dividing timeinto multiple slots. ǫ is privacy protection level when using local differentialprivacy. U nknown Rate is the ratio of what is regarded as unknown.
We describe the results of experiments to confirm the utility of the recommen-dation method.
The best location privacy preference definition:
When defining the loca-tion privacy preference, the best definition regarding the granularity of the infor-mation is not yet clear. In this experiment, we examine the influence of changingthe granularity of time on the utility. We change the criterion for dividing timeinto multiple slots as follows: time = 2 , , , , , , the other parameters areset as follows: ǫ = 0 .
01 and the
U nknown Rate = 0 .
1. The results are shown inFig. 3.From these results, we confirm that the precision drops when the granularityis small, that is, when the criterion time is short. This is because as the gran-ularity decreases, the number of goods in the recommendation increases, andthe matrix used for the recommendation becomes sparse. On the other hand,however, we confirm that the accuracy drops even if the granularity is too largefrom the result of the output by the model of time = 8 ,
12. This is becausethe consideration of time becomes difficult. From these results, we confirm thatwe should control the granularity in the decision of the definition consideringtrade-offs of the utilities.When we compare the recommendations using normal matrix factorizationand local differential privacy, there is not much difference in terms of utility.Therefore, the comparison also shows the possibility of an accurate recommen-dation while protecting privacy.
Impact of changes in the privacy preservation level:
The strength of theprivacy protection can be adjusted with the value ǫ in local differential privacy.The smaller the value of ǫ , the more privacy is protected. On the other hand,there is a risk that the utility of the recommendation decreases as the addednoise become large. Therefore, we examine the influence of changing the privacyprotection level on the utility. ecommendation of location privacy preferences with LDP 13(a) False positive rate (b) Recall(c) Reconstruction rate Fig. 3: Results when the time granularity is changed.In this evaluation, we change the privacy protection level as follows: ǫ =0 . , . , . , . , .
01, the other parameters are set as follows: time =6 and the
U nknown Rate = 0 .
1. The results are shown in Fig. 4.From the results, we confirm that a normal recommendation is more usefulin general, and as the value of ǫ increeases, the usefulness increases. On the otherhand, however, the change in the usefulness due to the change in the value of ǫ is small for ǫ = 0 . ǫ issmall. Impact of changes in the number of unknown evaluation values:
We ver-ify how much each user should know his/her privacy preference for an accurateprediction. In the evaluation, we regard a certain number of evaluated data asunevaluated in generating the model and verify the influence of the number of un-known evaluation values on the utility. We change the parameter for the percent-age of unevaluated data as follows:
U nknown Rate = 0 . , . , . , . , . , . , . , . , . U nknown Rate , the higher the number of evalua-tion values is. The other parameters are set as follows: time = 6 and ǫ = 0 . Fig. 4: Results when the ǫ is changed.From these results, we confirm that the utility tends to decrease as the num-ber of unevaluated data values increases. This is because the accuracy of therecommendation will be reduced if the training dataset is small. On the otherhand, from the result of reconstruction rate, when more data are learned, it willbe more likely that it will be difficult to make accurate recommendations. Thisis because overlearning occurs due to the large amount of learning data, and amodel specialized for specific data is generated.From these results, a large amount of learning data is necessary in order tomake accurate recommendations, but we have to control the amount of data togenerate a versatile model. We also confirm that a user who does not have alarge amount of data can use the recommendation accurately. In this section, we discuss the experimental results.First, we compare the utility of the models using normal matrix factorizationand local differential privacy. Since appreciable differences were not observedin devising the parameters, we confirm that the recommendation method canmaintain an accuracy comparable to that of normal matrix factorization. ecommendation of location privacy preferences with LDP 15(a) False positive rate (b) Recall(c) Reconstruction rate
Fig. 5: Results when the
U nknown Rate is changed.Next, we discuss about the influence of changing the following parameters:the granularity of the location privacy preference, the privacy protection level,and the number of unknown ratings. The best parameter in this evaluation canchange according to the data, but the method of selecting the parameter can beapplied to any data.As for the granularity of the location privacy preference, as the granularitybecomes coarse, that is, the criterion for dividing time into multiple slot becomeslong, the utility improves and becomes worse at a certain value. This is becausethe coarser the definition of the granularity is, the smaller the number of goodsin the recommendation, and the matrix used for prediction becomes dense. Onthe other hand, however, we confirm that when the granularity is too coarse,it becomes impossible to properly consider the property of time, and the accu-racy decreases. Therefore, in defining the location privacy preference, we shouldchoose the criterion with the highest utility. In this evaluation, the best criterionis 8 hours.Regarding the privacy protection level, the utility improves as the value of ǫ increases, but the increasing rate of the utility increases from a certain value. Weshould select the maximum parameter that can maintain prediction accuracy, since a stronger privacy protection level is achieved for a smaller value of ǫ .Therefore, in this evaluation, the best privacy protection level is ǫ = 0 . We propose a location privacy preference recommendation system that uses ma-trix factorization and achieves privacy protection by local differential privacy. Wealso confirm how to determine the best location privacy preference definition.In the evaluation using an artificial dataset from a location privacy preferencedataset and a trajectory dataset, we evaluate our method by using three metrics:a false positive rate, recall, and a reconstruction rate. From the results of theevaluation, we confirm that our method can maintain the utility at a level thatis the same as a method without privacy preservation. In addition, we confirmthat the best parameters are the granularity of the location privacy preferencedefinition, the privacy protection level, and the number of rated items.Our future works are listed below: – The best location information granularity in the location privacy prefer-ence definition: We examined the definition of time in the location privacypreference. However, it is also necessary to clarify the method of definingthe location. We will confirm the effectiveness of two pieces of information:geographical information such as latitude and longitude and category infor-mation of the place. – Extension of the recommendation method considering features of the loca-tion information: Although we confirmed that the utility was inferior to thatof normal matrix factorization, there is room for improvement. One ideato be considered is extending the consideration of the features of the posi-tion information when we use matrix factorization. The idea is that when auser decides a location privacy preference, if there is one place that he/shewishes to hide, this place has an influence on nearby places. Actually, a studyconsidered the influence of neighboring places in a system that recommendsplaces to visit [26], although this was not research related to location privacypreferences. – Application to a location privacy protection system: If we use the locationprivacy preferences predicted by our method, a location privacy preservingsystem can protect location information without users’ input. ecommendation of location privacy preferences with LDP 17 – Dealing with data correlations: in the current method, we do not take thedata correlations into account; however, previous studies [7] [8] show thatdifferential privacy suffers from temporal privacy leakage. We will considerto use techniques provided in [10] to solve this issue.
References
1. Information bank,
2. Abul, O., Bonchi, F., Nanni, M.: Never walk alone: Uncertainty for anonymityin moving objects databases. In: Data Engineering, 2008. ICDE 2008. IEEE 24thInternational Conference on. pp. 376–385. Ieee (2008)3. Andr´es, M.E., Bordenabe, N.E., Chatzikokolakis, K., Palamidessi, C.: Geo-indistinguishability: differential privacy for location-based systems. In: ACM Con-ference on Computer and Communications Security (2013)4. Balu, R., Furon, T.: Differentially private matrix factorization using sketching tech-niques. In: Proceedings of the 4th ACM Workshop on Information Hiding andMultimedia Security. pp. 57–62. ACM (2016)5. Berlioz, A., Friedman, A., Kaafar, M.A., Boreli, R., Berkovsky, S.: Applying differ-ential privacy to matrix factorization. In: Proceedings of the 9th ACM Conferenceon Recommender Systems. pp. 107–114. ACM (2015)6. Calandrino, J.A., Kilzer, A., Narayanan, A., Felten, E.W., Shmatikov, V.: ” youmight also like:” privacy risks of collaborative filtering. In: Security and Privacy(SP), 2011 IEEE Symposium on. pp. 231–246. IEEE (2011)7. Cao, Y., Yoshikawa, M., Xiao, Y., Xiong, L.: Quantifying Differential Privacy un-der Temporal Correlations. In: 2017 IEEE 33rd International Conference on DataEngineering (ICDE). pp. 821–832 (Apr 2017)8. Cao, Y., Yoshikawa, M., Xiao, Y., Xiong, L.: Quantifying Differen-tial Privacy in Continuous Data Release under Temporal Correlations.IEEE Transactions on Knowledge and Data Engineering pp. 1–1 (2018).https://doi.org/10.1109/TKDE.2018.2824328, 000009. Cao, Y., Xiao, Y., Xiong, L., Bai, L.: Priste: From location privacy to spatiotem-poral event privacy. arXiv preprint arXiv:1810.09152 (2018)10. Cao, Y., Xiong, L., Yoshikawa, M., Xiao, Y., Zhang, S.: ConTPL: controlling tem-poral privacy leakage in differentially private continuous data release. Proceedingsof the VLDB Endowment (12), 2090–2093 (2018)11. Cao, Y., Yoshikawa, M.: Differentially private real-time data release over infinitetrajectory streams. In: 2015 16th IEEE International Conference on Mobile DataManagement (MDM). vol. 2, pp. 68–73 (2015)12. Cao, Y., Yoshikawa, M.: Differentially private real-time data publishing over in-finite trajectory streams. IEICE TRANSACTIONS on Information and Systems E99-D (1), 163–175 (2016)13. De Montjoye, Y.A., Hidalgo, C.A., Verleysen, M., Blondel, V.D.: Unique in thecrowd: The privacy bounds of human mobility. Scientific reports , 1376 (2013)14. Dwork, C.: Differential privacy: A survey of results. In: International Conferenceon Theory and Applications of Models of Computation. pp. 1–19. Springer (2008)15. Fang, L., LeFevre, K.: Privacy wizards for social networking sites. In: Proceedingsof the 19th international conference on World wide web. pp. 351–360. ACM (2010)16. Fawaz, K., Shin, K.G.: Location privacy protection for smartphone users. In: Pro-ceedings of the 2014 ACM SIGSAC Conference on Computer and CommunicationsSecurity. pp. 239–250. ACM (2014)8 M. Asada et al.17. Frey, D., Guerraoui, R., Kermarrec, A.M., Rault, A.: Collaborative filtering undera sybil attack: analysis of a privacy threat. In: Proceedings of the Eighth EuropeanWorkshop on System Security. p. 5. ACM (2015)18. Hua, J., Xia, C., Zhong, S.: Differentially private matrix factorization. In: IJCAI.pp. 1763–1770 (2015)19. Huo, Z., Meng, X., Hu, H., Huang, Y.: You can walk alone: Trajectory privacy-preserving through significant stays protection. In: DASFAA (2012)20. Jiang, K., Shao, D., Bressan, S., Kister, T., Tan, K.L.: Publishing trajectories withdifferential privacy guarantees. In: SSDBM (2013)21. Jin, H., Saldamli, G., Chow, R., Knijnenburg, B.P.: Recommendations-based loca-tion privacy control. In: Pervasive Computing and Communications Workshops(PERCOM Workshops), 2013 IEEE International Conference on. pp. 401–404.IEEE (2013)22. Kanza, Y., Samet, H.: An online marketplace for geosocial data. In: Proceedingsof the 23rd SIGSPATIAL International Conference on Advances in GeographicInformation Systems. p. 10. ACM (2015)23. Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommendersystems. Computer (8), 30–37 (2009)24. Laboratory, N.D.R.: Consumer survey on concerning personal data(2016)25. Lam, S., Frankowski, D., Riedl, J.: Do you trust your recommendations? an explo-ration of security and privacy issues in recommender systems. Emerging trends ininformation and communication security pp. 14–29 (2006)26. Lian, D., Zhao, C., Xie, X., Sun, G., Chen, E., Rui, Y.: Geomf: joint geograph-ical modeling and matrix factorization for point-of-interest recommendation. In:Proceedings of the 20th ACM SIGKDD international conference on Knowledgediscovery and data mining. pp. 831–840. ACM (2014)27. Liu, Z., Wang, Y.X., Smola, A.: Fast differentially private matrix factorization. In:Proceedings of the 9th ACM Conference on Recommender Systems. pp. 171–178.ACM (2015)28. Machanavajjhala, A., Korolova, A., Sarma, A.D.: Personalized social recommen-dations: accurate or private. Proceedings of the VLDB Endowment (7), 440–450(2011)29. McSherry, F., Mironov, I.: Differentially private recommender systems: Buildingprivacy into the netflix prize contenders. In: Proceedings of the 15th ACM SIGKDDinternational conference on Knowledge discovery and data mining. pp. 627–636.ACM (2009)30. Nakagawa, H.: Introduction to privacy protection: Legal system and mathematicalbasis. Keiso Bookstore (2016)31. Parris, I., Abdesslem, F.B.: Crawdad data set st andrews/locshare () (2011)32. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: Grouplens: an openarchitecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACMconference on Computer supported cooperative work. pp. 175–186. ACM (1994)33. Ryuji Yokoyama, Y.U.: Real Activity Targeting. Nikkei Business Publishing (2015)34. Sadeh, N., Hong, J., Cranor, L., Fette, I., Kelley, P., Prabaker, M., Rao, J.: Un-derstanding and capturing people’s privacy policies in a mobile social networkingapplication. Personal and Ubiquitous Computing (6), 401–412 (2009)35. Shen, Y., Jin, H.: Privacy-preserving personalized recommendation: An instance-based approach via differential privacy. In: Data Mining (ICDM), 2014 IEEE In-ternational Conference on. pp. 540–549. IEEE (2014)ecommendation of location privacy preferences with LDP 1936. Shen, Y., Jin, H.: Epicrec: Towards practical differentially private framework forpersonalized recommendation. In: Proceedings of the 2016 ACM SIGSAC Confer-ence on Computer and Communications Security. pp. 180–191. ACM (2016)37. Shin, H., Kim, S., Shin, J., Xiao, X.: Privacy enhanced matrix factorization forrecommendation with local differential privacy. IEEE Transactions on Knowledgeand Data Engineering , 1770–1782 (2018)38. Sweeney, L.: k-anonymity: A model for protecting privacy. International Journalof Uncertainty, Fuzziness and Knowledge-Based Systems (05), 557–570 (2002)39. Toch, E.: Crowdsourcing privacy preferences in context-aware applications. Per-sonal and ubiquitous computing (1), 129–141 (2014)40. Warner, S.L.: Randomized response: A survey technique for eliminating evasiveanswer bias. Journal of the American Statistical Association (309), 63–69 (1965)41. Wasef, A., Shen, X.S.: Rep: Location privacy for vanets using random encryptionperiods. Mobile Networks and Applications (1), 172–185 (2010)42. Xiao, Y., Xiong, L., Zhang, S., Cao, Y.: LocLok: Location Cloaking with Differ-ential Privacy via Hidden Markov Model. Proc. VLDB Endow. (12), 1901–1904(Aug 2017)43. Zhao, Y., Ye, J., Henderson, T.: Privacy-aware location privacy preference recom-mendations. In: Proceedings of the 11th International Conference on Mobile andUbiquitous Systems: Computing, Networking and Services. pp. 120–129. ICST (In-stitute for Computer Sciences, Social-Informatics and (2014) A. Data processing
In the evaluation, we generate an artificial dataset by combining a location in-formation privacy preference dataset and a trajectory dataset. When joiningthe datasets, we use time information and category category information as akey, but the categories are different. The location privacy preference dataset hascategories as follows: { F ood and Drink, Leisue, Retail, Residential, Academic, Library } The trajectory dataset has categories as follows: { Community, Entertainment, F ood, N ightlif e, Outdoors, Shopping, T ravel } Therefore, it is necessary to convert the data so that both datasets have thesame category information. The specific generation method is as follows.1. Preprocessing of the trajectory datasetWe extract check-in data for places where two or more users visited from thetrajectory dataset.2. Preprocessing of both datasets (conversion of the time information)We split the exact date and time into time slots with respect to the timeinformation in both datasets. For example, when the data is divided every6 hours, the slots can be divided as follows: { M orning, N oon, Af ternoon, Evening, N ight, M idnight } In the evaluation, we divide the slots using multiple criteria: 2, 3, 4, 6, 8,and 12 hours. For a shorter criterion time, the recommendation obtains alarger number of goods.3. Preprocessing of both datasets (conversion of the location information)We converted the category information of the trajectory dataset, since weunify the categories in datasets with fewer categories as follows: – Community
We convert the information according to its lower category. A category isconverted to “
Residential ” if its lower category is “
Home ”. A categoryis converted to “
Library ” if its lower category is “
Library ”. A categorywith a different lower category is converted to “