Information filtering via biased heat conduction
aa r X i v : . [ phy s i c s . d a t a - a n ] D ec Information filtering via biased heat conduction
Jian-Guo Liu ∗ Research Center of Complex Systems Science,University of Shanghai for Science and Technology,Shanghai 200093, Peoples’s Republic of China andCABDyN Complexity Center, Sa¨ıd Business School, University of Oxford, Park End Street, Oxford, OX1 1HP, United Kingdom
Tao Zhou † Web Sciences Center, University of Electronic Science and Technology of China, Chengdu 610054, Peoples’s Republic of China
Qiang Guo
Research Center of Complex Systems Science,University of Shanghai for Science and Technology,Shanghai 200093, Peoples’s Republic of China
Heat conduction process has recently found its application in personalized recommendation [T. Zhou et al. ,PNAS 107, 4511 (2010)], which is of high diversity but low accuracy. By decreasing the temperatures of small-degree objects, we present an improved algorithm, called biased heat conduction (BHC), which could simulta-neously enhance the accuracy and diversity. Extensive experimental analyses demonstrate that the accuracy onMovieLens, Netflix and Delicious datasets could be improved by 43.5%, 55.4% and 19.2% compared with thestandard heat conduction algorithm, and the diversity is also increased or approximately unchanged. Furtherstatistical analyses suggest that the present algorithm could simultaneously identify users’ mainstream and spe-cial tastes, resulting in better performance than the standard heat conduction algorithm. This work provides acreditable way for highly efficient information filtering.
PACS numbers: 89.20.Hh, 89.75.Hc, 05.70.Ln
With the advent of the Internet [1] and wide applicationof Web 2.0 techniques, there sprout many web sites that en-able large communities to aggregate and interact. For exam-ple, Twitter allows its . × members to share interestsand life experiences, Facebook has already exceeded 500 mil-lion members since July 16th, 2010, and their members aregrowing ever faster. This brings massive amount of accessibleinformation, more than every individual’s ability to process.Searching, filtering and recommending thus become indis-pensable in the Internet era, in which the personalized recom-mender systems have become an effective tool to address theinformation overload problem by predicting users’ interestsand habits based on their historical records. Personalized rec-ommender systems have been used to recommend books andCDs at Amazon.com, movies at Netflix.com, and news at Ver-sifi Technologies (formerly AdaptiveInfo.com) [2]. Motivatedby the practical significance to e-commerce, recommendersystems have caught increasing attention and become an es-sential issue [3, 4]. A personalized recommender system in-cludes three parts: data collection, model analysis and recom-mender algorithm, where the algorithm is the core part. Thusfar, various kinds of algorithms have been proposed, includingcollaborative filtering (CF) approaches [5–10], content-basedanalyses [11, 12], tag-aware algorithms [13–15], link predic-tion approaches [16–18], hybrid algorithms [19, 20], and soon. For a review of current progress, see Refs. [2, 21] and the ∗ [email protected] † [email protected] references therein. FIG. 1: (Color online) Illustration of heat conduction algorithm on abipartite user-object network: (a) The objects collected by the targetuser are activated, with temperature 1, while others are of temper-ature 0. (b) Each user’s temperature is the average over all her/hiscollected objects. (c) Same process happens from users to objects.
A recommender system could be described by a bipartitenetwork [22, 23], in which there are two kinds of nodes: users U and objects O . The users’ historical records are representedby the edges connecting users and objects. Supposing thereare m objects O = { o , o , · · · , o m } and n users U = { u , u , · · · , u n } , the system can be fully described by an adjacencymatrix A = { a lα } m,n , where a lα = 1 if o α is collectedby u l , and a lα = 0 otherwise. A reasonable assumption isthat objects collected by users are what these users like and arecommendation algorithm aims at predicting users’ personalopinions on the objects they have not yet collected [24–26].In the standard heat conduction (HC) algorithm, we first con-struct a propagator matrix W h , where the element w αβ de-notes the conduction rate from object o β to o α . Denote H asthe temperature vector of m components: the source compo-nents are of temperature 1, while the remaining componentsare of temperature 0. Then the temperatures associated withthe remaining nodes could be calculated by solving the ther-mal equilibrium equation W h H = f [26], where f is the fluxvector. This is the discrete analog of − κ ∇ T ( ~r ) = ~ ∇ · ~J ( ~r ) ,where κ is the thermal conductivity, ∇ T ( ~r ) is the tempera-ture gradient and ~ ∇ · ~J ( ~r ) is the local heat flux. In this pa-per, H ( i ) plays the role of − κT ( ~r ) and f ( i ) plays the role of ~ ∇ · ~J ( ~r ) [26]. In the standard HC algorithm, the temperatureof the collected objects is constant, and the heat will diffusefrom objects to users, and then from users to objects. Thetemperatures of the uncollected objects are then consideredas recommendation scores: the objects given higher tempera-tures would be recommended preferentially (see Fig.1 for anillustration). Since HC algorithm [26] is implemented basedon matrix operations, it is very time-consuming and cannot beapplied to large-scale systems. Zhou et al. [4] proposed a lo-cal HC algorithm, which spreads the heat on the user-objectbipartite network and can quickly generate highly diverse yetless accurate recommendations. As a benchmark for compar-ison, we call it standard HC algorithm (hereinafter, HC onlystands for local heat conduction algorithm [4]).In this Brief Report, we present the biased heat conduc-tion (BHC) algorithm to see how objects’ degrees affect thealgorithmic performance. Using data from three real sys-tems (MovieLens, Netflix and Delicious), we show that givinghigher temperatures to the large-degree objects than the stan-dard HC algorithm could generate highly accurate and diverserecommendations.To test the performance of a recommendation algorithm, werandomly divide a given data set into two parts: the trainingset and the probe set. The information contained in the probeset is not allowed to be used for recommendation, namely weprovide a recommendation list for each user only based onthe training set. In this Brief Report, we always keep 90% oflinks in the training set and 10% of links in the probe set, andemploy three different metrics to measure accuracy, noveltyand diversity of recommendations. Accuracy [25]. A good recommender algorithm shouldrank preferable objects that match the user tastes in higher po-sitions, i.e., the objects in the probe set (indeed being collectedby users) should be put in high positions of the recommenda-tion list. For a user u i , if the entry u i - o j is in the probe set,we measure the position of o j in the ordered list for u i . Forexample, if there are uncollected objects for u i and o j isthe 3rd one from the top, we say the position of o j is / ,denoted by r ij = 0 . . A good algorithm is expected to givesmall r ij . Therefore, the mean value of the position h r i overall entries in the probe set can be used to evaluate the algorith-mic accuracy: the smaller the average ranking score [25], thehigher the algorithmic accuracy. Novelty and diversity [27]. Since there are countless chan-nels to obtain popular objects’ information, uncovering veryspecific preference, corresponding to unpopular ones, is muchmore significant than simply picking out what a user likesfrom the list of the best sellers [4]. To measure this fac-tor, we go simultaneously in two directions: novelty (mea-
TABLE I: Basic statistics of the tested data sets.Data Sets Users Objects Links SparsityMovieLens 1,574 943 82,520 . × − Netflix 10,000 6,000 701,947 . × − Delicious 10,000 232,657 1,233,997 . × − TABLE II: Algorithmic performance for
MovieLens, Netflix and
De-licious data sets on the standard HC algorithm [4]. The popularity h k i and diversity S are obtained at L = 10 .Data Sets h r i h k i S MovieLens 0.15156 3.085 0.88196Netflix 0.10629 1.344 0.86296Delicious 0.26129 1.915 0.98066 sured by popularity ) and diversity (measured by
Hammingdistance ). The popularity is defined as average degree of allrecommended objects, h k i . Since it’s hard for the users tofind the unpopular objects, a good algorithm should prefer torecommend small average objects. In addition, the personal-ized recommendation algorithm should present different rec-ommendation lists to different users according to their tastesand habits. The diversity is quantified by the Hamming dis-tance S = h H ij i , where H ij = 1 − Q ij ( L ) /L , with L isthe length of recommendation list and Q ij ( L ) is the numberof overlapped objects in u i ’s and u j ’s recommendation lists.The larger S corresponds to higher diversity.Three benchmark datasets, named MovieLens, Netflix andDelicious (See Table 1 for basic statistics), are used to test thepresent algorithm. The Netflix data set is a randomly sampleof huge dataset provided for the Netflix Prize [30], and theDelicious data set is obtained by downloading publicly avail-able data from the social bookmarking web site Delicious.com(taking care to anonymize user identity in the process). TheDelicious data is inherently unary while both MovieLens andNetflix data sets contain explicit ratings from one to five. Weapply a coarse-graining method to transform them into unaryforms: an object is considered to be collected by a user only ifthe given rating is larger than 2. The sparsity of the data setsis defined as the number of links divided by the total numberof user-object pairs.Applying the standard HC algorithm on MovieLens, Net-flix and Delicious data sets, h r i , h k i and S are shown in TableII. One can find that although the accuracy of the standardHC algorithm is poor, it provides highly diverse recommen-dations. We argue that the less accuracy of the standard HCalgorithm lies in the fact that it assigns overwhelming priorityto the small-degree objects, leading to strong bias. Therefore,the standard HC algorithm could be improved by reinforcingthe influence of the large-degree objects. In the last step ofthe standard HC algorithm, all of the heat an object has re-ceived is divided by its degree. Although the large-degree ob- < r > < k > S Delicious Netflix MovieLens
FIG. 2: (Color online) Performance of the BHC algorithm on Movie-Lens, Netflix and Delicious data sets. The plots (a)-(c) show av-erage ranking score h r i vs. λ . Subject to h r i , the optimal λ opt are0.84, 0.85 and 0.50, and the corresponding h r i opt are 0.0852, 0.0474,0.2112. The plots (d)-(f) display the results for h k i and (g)-(i) for S with L = 10 . All the data points are averaged over ten independentruns with different divisions of training-probe sets. jects could receive lots of heat, their temperatures are verylow, while small-degree objects would obtain high tempera-tures and thus be put in the top positions of recommendationlists. A clear advantage of the standard HC algorithm is itsability to dig out the unmainstream tastes that almost can notbe found by classical methods. However, users generally likepopular objects and thus an algorithm should also give chanceto them. We therefore propose the BHC algorithm taking intoaccount the object degree effect in the last diffusion step. Toan target object o α , instead of dividing by its degree k ( o α ) , thefinal temperature is obtained dividing by k λ ( o α ) . The element w αβ of the matrix W h would be w αβ = k λ ( o α ) P nl =1 a lα a lβ k ( u l ) .Comparing with the standard HC algorithm (i.e., λ = 1 ), theinfluences of large-degree objects would be strengthened if λ < or depressed if λ > .A summary of the primary results for BHC algorithm isgiven in Table III. Figure 2.(a-c) report the algorithmic ac-curacy h r i as a function of λ , from which one can find thatthe curves obtained by BHC have clear minimums. For ex-ample, the optimal parameter of MovieLens data is around λ opt = 0 . , strongly supporting our argument that the effectsof large-degree objects should be increased. Compared withthe standard case (i.e. λ = 1 ), the average ranking score h r i is reduced from 0.1516 to 0.0852 (improved by 43.5%). Thisresults indicate that giving more opportunities to the large-degree objects will greatly increase the algorithmic accuracy.More interestingly, when L = 10 , the Hamming distance ofMovieLens is also improved from 0.8820 to 0.9248 (see Fig.2(i)), which is even better than 0.9173 obtained by the hybirdalgorithm [4]. Actually, the standard HC algorithm prefersto give more opportunities to the small-degree objects andranks them at the top positions of many users’ recommenda-tion lists. Therefore, the Hamming distance may not be the Netflix -1 (d)(c) (b)(a) Degree distribution Mass diffusion -1 Heat conduction n ( k ) k -1 opt =0.84 FIG. 3: The plot (a) shows the object degree distribution of Net-flix data, and (b)-(d) show the correlations between the occurrencenumber n ( k ) and the object degree k of MD, standard HC and BHCalgorithms when L = 10 . The results of MovieLens and Deliciousare similar.TABLE III: Algorithmic performance on BHC algorithm. The Ham-ming distance is corresponding to L = 10 .Data Sets λ opt h r opt i Improvement S opt MovieLens 0.84 0.0852 43.5% 0.9248Netflix 0.85 0.0474 55.4% 0.8200Delicious 0.50 0.2112 19.2% 0.9795 highest although the popularity is the lowest. Figure 2(b,e,h)show the similar results on Netflix, where the optimal param-eter is λ opt = 0 . . Results of MovieLens and Netflix arevery close to each other, with the fact that both data sets aremovie-related and the sparsity is close. The optimal parame-ter λ opt on Delicious (See Fig.2(a,d,g)) equals 0.5, with verysmall h k i and very high S ( ≈ . ). Both the optimal rankingscore h r i opt = 0 . and the Hamming distance S = 0 . of Delicious are much larger than the ones of MovieLens andNetflix. The results are twofold: the higher sparsity of edgesand the larger number of objects. The former leads to lessaccurate recommendation while the latter results in higher di-versity.Table IV reports the performances obtained by several al-gorithms on MovieLens dataset, from which one can find theaccuracy h r i of BHC algorithm is close to the result of HO-CFalgorithm which needs to compute the second-order similarityinformation, and the diversity of BHC algorithm is the high-est one. In order to explain the reasons why both accuracy anddiversity can be enhanced by BHC algorithm, the frequenciesof appearances n ( k ) of objects of degree k in all users’ rec-ommendation lists are investigated. We show the results ofa typical example, Netflix, where the length of recommen-dation list is L = 10 . Different from the power-law degreedistribution in Fig.3(a), n ( k ) of BHC algorithm has butterfly TABLE IV: Algorithmic performance for
MovieLens data. h k i and S are corresponding to L = 10 . MD is abbreviations of the algo-rithms proposed in Ref. [25], Heter-NBI, HO-CF, IMCF and WHCare abbreviations of algorithms with heterogeneous initial resourcedistribution proposed in Ref. [27], high-order collaborative filtering(CF) algorithm proposed in Ref. [28], improved modified CF algo-rithm in Ref. [29] and the algorithm presented in Ref. [23].Algorithms h r i S h k i MD 0.1060 0.617 233HC 0.1516 0.750 3.09Heter-NBI 0.1010 0.682 220HO-CF 0.0826 0.9127 237IMCF 0.0877 0.826 175WHC 0.0914 0.941 179BHC 0.0852 0.925 197 shape, which means that the objects with large or small de-grees are recommended more frequently. Figure 3(b) showsthat mass diffusion algorithm prefers to recommend the large-degree objects, while Fig. 3(c) shows that the standard HCalgorithm gives higher recommendation scores to the small-degree objects, thus the popular objects are largely depreci-ated. Comparing Fig. 3(c) with Fig. 3(d), at the optimal case λ opt = 0 . , both small-degree and large-degree objects arerecommended with high frequency by the BHC algorithm. Ina word, the advantage of BHC is that it could not only dig outthe users’ very special tastes, but also find out the commoninteresting objects.In this Brief Report, we propose a biased heat conductionalgorithm by considering the degree effects in the last stepof the local heat conduction process [4], which could greatlyimprove the accuracy of the standard HC algorithm. In the standard HC algorithm, the small-degree objects are recom-mended overwhelmingly because in the last step, to calculatethe temperature, the received heat is divided by the object de-gree. This division largely depresses the chance of a large-degree object to be recommended. In contrast, the power-law object degree distribution indicates that large-degree ob-jects are preferred by many users, therefore a good algorithmshould also pay attention to the them. In addition, a per-sonalized recommender system should provide each user rec-ommendations according to his/her own interests and habits.Therefore the diversity of recommendation lists plays a cru-cial role to quantify the personalization. The numerical resultsshow that the recommendation lists generated by the BHC al-gorithm are of competitively higher diversity and remarkablyhigher accuracy than those generated by the standard HC al-gorithm. The statistical results on Facebook applications alsoshow that the objects could be divided into two categories[31]. One of them is collected by almost all of users, whileothers are only collected by small-size group users, which in-dicates that the users’ tastes could be expressed by two cat-egories: popular one and special one. Therefore, the reasonwhy BHC could produce higher accuracy is that users’ twokinds of interests could be simultaneously identified. How-ever, how to timely track users’ current popular and specialtastes is still an open problem.We acknowledge GroupLens
Research Group for provid-ing us
MovieLens data and the Netflix Inc. for
Netflix data.This work is partially supported by the European Commis-sion FP7 Future and Emerging Technologies Open SchemeProject ICTeCollective (Contract 238597), the National Nat-ural Science Foundation of China (Grant Nos. 10905052,and 60973069), JGL is supported by Shanghai Leading Dis-cipline Project (No. S30501) and Shanghai Rising-Star Pro-gram (11QA1404500). [1] G.-Q. Zhang, G.-Q. Zhang, Q.-F. Yang, S.-Q. Cheng, T. Zhou,New J. Phys. , 123027 (2008).[2] G. Adomavicius, and A. Tuzhilin, IEEE Trans. Know. & DataEng. , 734(2005).[3] J. B. Schafer, J. A. Konstan, and J. Riedl, Data Mining &Knowledge Discovery, , 115 (2001).[4] T. Zhou, Z. Kuscsik, J.-G. Liu, M. Medo, J. R. Wakeling, andY.-C. Zhang, Proc. Natl. Acad. Sci. U.S.A. , 4511 (2010).[5] J. L. Herlocker, J. A. Konstan, K. Terveen, and J. Riedl, ACMTrans. Inform. Syst. , 5 (2004).[6] J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, L. R.Gordon, and J. Riedl, Commun. ACM , 77 (1997).[7] J.-G. Liu, B.-H. Wang, and Q. Guo, Int. J. Mod. Phys. C ,285 (2009).[8] J.-G. Liu, T. Zhou, H.-A. Che, B.-H. Wang, and Y.-C. Zhang,Physica A , 881 (2010).[9] D. Sun, T. Zhou, J.-G. Liu, R. -R. Liu, C. -X. Jia, and B. -H.Wang, Phys. Rev. E , 017101 (2009).[10] J.-G. Liu, T. Zhou, B.-H. Wang, Y.-C. Zhang, and Q. Guo, Int.J. Mod. Phys. C , 137 (2009).[11] M. Balabanovi´c and Y. Shoham, Commun. ACM , 66 (1997).[12] M. J. Pazzani, Artif. Intell. Rev. , 393 (1999). [13] M.-S. Shang, and Z.-K. Zhang, Chin. Phys. Lett. , 118903(2009).[14] Z. -K. Zhang, T. Zhou, and Y.-C. Zhang, Physica A , 179(2010).[15] M. -S. Shang, Z.-K. Zhang, T. Zhou, and Y.-C. Zhang, PhysicaA , 1259 (2010).[16] T. Zhou, L. L¨u, and Y.-C. Zhang, Eur. Phys. J. B , 623 (2009).[17] L. L¨u and T. Zhou, Europhys. Lett. , 18001 (2010).[18] L. L¨u and T. Zhou, Physica A , 1150 (2011).[19] M. Pazzani and D. Billsus, Machine Learning , 313 (1997).[20] N. Good, J. B. Schafer, J. A. Konstan, A. l. Borchers, B. Sar-war, J. Herlocker, and J. Riedl, in Proceedings of the sixteenthnational conference on Artificial Intellgence , 1999, p. 439.[21] J. -G Liu, M. Z. -Q. Chen, J. Chen, F. Deng, H. -T. Zhang, Z.-K. Zhang, and T. Zhou. Int. J. Inf. Syst. Sci. , 230 (2009).[22] M.-S. Shang, L. L¨u, Y.-C. Zhang, and T. Zhou, Europhys. Lett. , 48006 (2010).[23] J.-G. Liu, Q. Guo, and Y.-C. Zhang, Physica A , 2414(2011).[24] Y.-C. Zhang, M. Medo, J. Ren, T. Zhou, T. Li, and F. Yang,Europhys. Lett. , 68003 (2008).[25] T. Zhou, J. Ren, M. Medo, and Y.-C. Zhang, Phys. Rev. E , ,154301 (2007).[27] T. Zhou, L.-L. Jiang, R.-Q. Su, and Y.-C. Zhang, Europhys.Lett. , 58004 (2008).[28] T. Zhou, R.-Q. Su, R.-R. Liu, L.-L. Jiang, B.-H. Wang, and Y.-C. Zhang, New J. Phys. , 123008 (2009). [29] J.-G. Liu, T. Zhou, H.-A. Che, B.-H. Wang, and Y.-C. Zhang,Physica A , 881 (2010).[30] J. Bennett and S. Lanning, in Proceedings of the KDD CupWorkshop , New York, 2010, p. 3.[31] J. P. Onnela and F. Reed-Tsochas, Proc. Natl. Acad. Sci. U.S.A.107