Predicting human preferences using the block structure of complex social networks
Roger Guimera, Alejandro Llorente, Esteban Moro, Marta Sales-Pardo
PPredicting Human Preferences Using the Block Structureof Complex Social Networks
Roger Guimera` * , Alejandro Llorente , Esteban Moro , Marta Sales-Pardo Institucio´ Catalana de Recerca i Estudis Avanc¸ats, Barcelona, Catalonia, Spain, Departament d’Enginyeria Qumica, Universitat Rovira i Virgili, Tarragona, Catalonia,Spain, Instituto de Ingenierı´a del Conocimiento, Universidad Auto´noma de Madrid, Madrid, Spain, Departamento de Matema´ticas & Grupo Interdisciplinar de SistemasComplejos, Universidad Carlos III de Madrid, Legane´s, Madrid, Spain, Instituto de Ciencias Matema´ticas, Consejo Superior de Investigaciones Cientı´ficas-AutonomousUniversity of Madrid-Universidad Complutense de Madrid-Universidad Carlos III de Madrid, Madrid, Madrid, Spain
Abstract
With ever-increasing available data, predicting individuals’ preferences and helping them locate the most relevantinformation has become a pressing need. Understanding and predicting preferences is also important from a fundamentalpoint of view, as part of what has been called a ‘‘new’’ computational social science. Here, we propose a novel approachbased on stochastic block models, which have been developed by sociologists as plausible models of complex networks ofsocial interactions. Our model is in the spirit of predicting individuals’ preferences based on the preferences of others but,rather than fitting a particular model, we rely on a Bayesian approach that samples over the ensemble of all possiblemodels. We show that our approach is considerably more accurate than leading recommender algorithms, with majorrelative improvements between 38% and 99% over industry-level algorithms. Besides, our approach sheds light on decision-making processes by identifying groups of individuals that have consistently similar preferences, and enabling the analysisof the characteristics of those groups.
Citation:
Guimera` R, Llorente A, Moro E, Sales-Pardo M (2012) Predicting Human Preferences Using the Block Structure of Complex Social Networks. PLoSONE 7(9): e44620. doi:10.1371/journal.pone.0044620
Editor:
Santo Fortunato, Aalto University, Finland
Received
July 2, 2012;
Accepted
August 6, 2012;
Published
September 11, 2012
Copyright: (cid:2)
Funding:
This work was supported by a James S. McDonnell Foundation Research Award (RG and MSP), grants PIRG-GA-2010-277166 (RG) and PIRG-GA-2010-268342 (MSP) from the European Union, and grants FIS2010-18639 (RG and MSP), FIS2006-01485 (MOSAICO) (EM) and FIS2010-22047-C05-04 (EM) from theSpanish Ministerio de Economı´a y Competitividad. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of themanuscript.
Competing Interests:
The authors have declared that no competing interests exist.* E-mail: [email protected]
Introduction
Humans generate information at an unprecedented pace, withsome estimates suggesting that in a year we now produce on theorder of 10 bytes of data, millions of times the amount ofinformation in all the books ever written [1]. In this context,predicting individuals’ preferences and helping them locate themost relevant information has become a pressing need. Thisexplains the outburst, during the last years, of research onrecommender systems, which aim to identify items (movies orbooks, for example) that are potentially interesting to a givenindividual [2–4].However, understanding and ultimately predicting humanpreferences and behaviors is also important from a fundamentalpoint of view. Indeed, the digital traces that we leave with all sortsof everyday activities (shopping, communicating with others,traveling) are ushering in a new kind of computational socialscience [5,6], which aims to shed light on human mobility [7,8],activity patterns [9], decision-making processes [10], socialinfluence [11–13], and the impact of all these in collective humanbehavior [14,15].Existing recommender systems are good at solving the practicalproblem of providing quick estimates of individuals’ preferences,but they often emphasize computational performance over otherimportant questions such as whether the algorithms are mathe-matically well-grounded or whether the implicit models and assumptions are easy to interpret (and therefore to modify and finetune). In contrast, algorithms that are based on plausible, easily-interpretable assumptions and that are based on solid mathemat-ical grounds are useful in themselves and, arguably, hold the mostpotential to advance in the solution of the problem at thefundamental and practical levels. Here we present one suchapproach and show that it performs better than state-of-the-artrecommender systems.In particular, we focus on what is called collaborative filtering[16], namely making predictions about preferences based onpreferences previously expressed by users. The underlyingassumption in virtually all collaborative filtering approaches isthat similar people have similar ‘‘interactions’’ with similaritems. This consideration is usually taken into accountheuristically. For example, in memory-based methods [16],one tries to identify users that are similar to the one for whichwe seek a prediction; or items that are similar to the targetitem. From these ‘‘neighbors’’ one then obtains a weightedaverage. In matrix factorization approaches [17], one assumesthat each user and item can be characterized by a low-dimensional ‘‘feature vector,’’ and that the rating of an item bya user is the product of their feature vectors.In contrast, we base our predictions in a family of models [18–21] that have been developed and are widely used by sociologistsas plausible models of complex social networks, that is, of howsocial actors establish relationships (friendship relationships with ach other, or membership relationships with institutions, forexample). In this family of models, social actors are divided intogroups and relationships between two actors are establisheddepending solely on the groups to which they belong. Because oftheir simplicity and their explanatory power, these models areincreasingly being studied as general models of complex (notnecessarily social) networks [22–24].In the context of predicting human preferences, block modelsassume that users and items can be simultaneously classified intocategories, and that the category of the user and the category ofthe item fully determine the rating. Therefore, the model isextremely easy to interpret. Additionally, our algorithm ismathematically sound because it uses a Bayesian approach thatdeals rigorously with the uncertainty associated with the modelsthat could potentially account for observed users’ ratings. Indeed,our approach averages over the ensemble of all possible groupingsof users and items, exploiting the formal analogies that existbetween statistical inference and statistical physics [25].Finally, our algorithm sheds light on the factors determiningpreferences because it allows one to study the groupings that havethe most explanatory power or that accurately account for certainfeatures of the users’ ratings. Bayesian Predictions Based on the Ensemble ofStochastic Block Models
Consider the observed ratings R O , whose element r Oui representsthe rating of user u on item i (Fig. 1). Note that not all elements inthis ‘‘matrix’’ are defined, since only some pairs ( u , i ) are actuallyobserved; we call O the set of observed ( u , i ) pairs. Like incollaborative filtering approaches [2,3], we assume that theseobservations are all the information that the algorithm can use tomake predictions about unobserved ratings (in other words, we donot use any information about users or items other than pastratings). Our problem is then to estimate the probability p ( r ui ~ r D R O ) that the unobserved rating of item i by user u is r ui ~ r , given the observation R O .Let’s assume that the observed ratings can be explained by oneof the models in a family M of generative models. Then, p ( r ui ~ r D R O ) ~ ð M dM p ( r ui ~ r D M ) p ( M D R O ), ð Þ where p ( r ui ~ r D M ) is the probability that r ui ~ r if the ratingswhere actually generated using model M , and p ( M D R O ) is theplausibility of model M given the observation. Using Bayestheorem Eq. (1) becomes p ( r ui ~ r D R O ) ~ Ð M dM p ( r ui ~ r D M ) p ( R O D M ) p ( M ) Ð M dM ’ p ( R O D M ’ ) p ( M ’ ) , ð Þ where p ( R O D M ) is the probability that model M gives rise to R O among all possible ratings (or the likelihood of the model), and p ( M ) is the a priori probability that model M is the correct one (orprior). This equation is formally equivalent to those derived in thecontext of network inference [22] and, more broadly, to those usedin Bayesian model averaging [26].Although Eq. (2) is the correct probabilistic treatment of R O forinference of unobserved ratings, in practice predictions will only beaccurate if the models in M (or at least some of them) correctlydescribe how users actually rate items. Additionally, the modelsneed to be simple enough that they are analytically orcomputationally tractable. We consider the family M SBM of stochastic block models [19–22]. In a stochastic block model, users and items are partitionedinto groups and the probability that a user rates an item with r ui ~ r depends, exclusively, on the groups s u and s i to which theuser and the item belong, that is p ( r ui ~ r D M ) ~ q r ( s u , s i ) [ ½ (cid:2) , ð Þ with P r q r ( s u , s i ) ~ .Consider the case in which ratings can take K different values r [ f . . . , K g (we use the labels . . . , K for simplicity, but the onlyrequirement is that there are K non-overlapping classes, which donot need to be ordinals). Under the assumption of no priorknowledge about the models ( p ( M ) ~ c onst : ), one can partiallyintegrate Eq. (2) (see Methods) to obtain p SBM ( r ui ~ r D R O ) ~ Z X PU [ P UPI [ P I n r s u s i z n s u s i z K ! e { H ( PU , PI ) , ð Þ where the sum is over all possible partitions of users and items intogroups ( P U and P I , respectively), n r s u s i is the number of r -ratingsobserved from users in group s u to items in group s i , and n s u s i ~ P Kk ~ n k s u s i is the total number of observed ratings fromusers in s u to items in s i . The ‘‘Hamiltonian’’ H ( P U , P I ) , whichweights the contribution of each partition, depends only on thepartition H ( P U , P I ) ~ X a , b ln n ab z K { (cid:2) (cid:3) ! { X Kk ~ ln n k ab (cid:4) (cid:5) ! " ð Þ and Z ~ P e { H is the partition function.Although carrying out the exhaustive summation over allpartitions in Eq. (4) is unfeasible, one can estimate p S BM ( r ui ~ r D R O ) using Metropolis sampling [22,25,27]. Giventhese probabilities, our prediction for a given rating is the one thatmaximizes the probability r (cid:3) ui ~ arg max r p SBM ( r ui ~ r D R O ) : ð Þ Benchmark Algorithms
To test how accurately our stochastic block model (SBM)algorithm predicts human preferences, we compare its perfor-mance to that of some of the most accurate algorithms in theliterature of collaborative filtering recommender systems (seeMethods for details) [4]. First, we consider a matrix factorizationmethod [17] based on singular value decomposition (SVD) [28],which uses stochastic gradient descent to minimize the deviationsbetween model predictions and observed ratings [17]. We use twoimplementations of this algorithm: our own implementation(SVD1) as well as a highly optimized implementation providedby LensKit framework (SVD2) [4]. Second, we consider analgorithm based on the similarity between items [4,29], and againuse the LensKit implementation (Item-Item). Additionally, weconsider a baseline naive recommender, where the rating of an tem by a user is simply the average rating of the item by all usersthat have rated it before [4].
Results
Performance Comparison on Model Ratings
To investigate how our approach performs compared to thebenchmark algorithms, and in what situations it works better orworse, we start by generating model dichotomous like/dislikeratings as follows. First, each item i is assigned an intrinsic quality Q i [ ½ (cid:2) . Additionally, items and users are partitioned into groups,and each user u has an a priori preference P ( s u , s i ) [ ½ (cid:2) for item i ,where s u and s i are the user and item groups, respectively. Then,the probability that u rates i with r ~ (‘‘like’’, as opposed to r ~ ,‘‘dislike’’) is p ( r ui ~ ~ Q (1 { a ) i P ( s u , s i ) a , ð Þ where a [ ½ (cid:2) is a parameter that enables us to interpolate betweena situation in which the intrinsic quality of the item is the onlyrelevant factor ( a ~ ) and a situation in which a priori preferencesare the only relevant factor ( a ~ ).In Fig. 2, we show the performance of the different algorithmswhen applied to model ratings. When the intrinsic quality is thedominant factor in user ratings ( a v : ), all algorithms performsimilarly well. Of note, in the limiting case where intrinsic qualityis the only relevant factor ( a ~ ), the naive recommender is theoptimal predictor and does indeed perform slightly better than theothers. Conversely, when a priori preferences start playing a significantrole ( a w : ) algorithms start to differ in their performance. Asexpected, the naive recommender performs poorly in this regime Figure 1. Predicting preferences using stochastic block models. (A) Users A – H rate movies a – h as indicated by the colors of the links. (B-C)Matrix representation of the ratings; patterned gray elements represent unobserved ratings. Different partitions of the nodes into groups (indicatedby the dashed lines) provide different explanations for the observed ratings. The partition in (B) has much explanatory power (low H ) because ratingsin each pair of user-item groups are very homogeneous. For example, it seems plausible that C would rate item a with a 2, given that all users in the f C , D , G g group give a 2 to all items in group f a , b , g g . Conversely, the partition in (C) has very little explanatory power. According to Eq. 4, thepredictions of (B) contribute much more than those of (C) to the inference of unobserved ratings.doi:10.1371/journal.pone.0044620.g001 Figure 2. Algorithm comparison for model ratings.
We show theprediction accuracy (that is, the fraction of correct rating predictions) asa function of the parameter a that measures the importance of a prioripreferences as opposed to intrinsic item quality (see text for details).The black line represents the optimal prediction accuracy, which wouldbe obtained if the algorithms were able to estimate exactly theprobability of each rating. For all the simulations we use: n u ~ usersorganized in 5 groups; n i ~ items organized in 5 groups; P ( s u , s i ) uniformly distributed in ½ (cid:2) nd becomes totally uninformative when a ~ . The performancesof the other algorithms are closer, but SBM is significantly andconsistently the most accurate.Of course, for a ~ model ratings are generated according to ablock model, so the SBM approach is expected to work best.However, it is worth pointing out that at least for these modelratings, the most advanced collaborative filtering approaches arenever the most accurate, regardless of the value of a –either theyperform slightly worse than the naive recommender, or theyperform significantly worse than the SBM. Since these collabora-tive filtering approaches are known to be much more accurate inreal data than the naive approach (indeed, they are consistentlythe most accurate among collaborative filtering methods in theliterature [4]), our results on model ratings suggest that the SBMalgorithm has the potential to provide good estimates on real data.Additionally, our approach also seems to be the most robustbecause it never provides estimates that are significantly worsethan those produced by any other algorithm. Performance Comparison on the MovieLens Dataset
The MovieLens dataset is one of the gold-standards for testingcollaborative filtering algorithms [4]. It contains 100,000 realratings ( r [ f g ) from 943 users on 1,682 movies, whichwere collected through the MovieLens web site (movielens.um-n.edu) during the seven-month period from September 19th, 1997through April 22nd, 1998. For purposes of validation, the datasetis organized in five different splits, each containing a training set R O with 80,000 ratings and a test set with 20,000 ratings.As we show in Fig. 3, our algorithm is the most accurate for alland each of the test sets, both in terms of the classificationaccuracy (that is, the fraction of predictions that are exactlycorrect) and in terms of the mean absolute error (the mean of theabsolute value of the difference between the predicted and the realratings).To fully appreciate the importance of our improvement overexisting algorithms, it is worth noting that, in terms of classificationaccuracy, the average improvement of the SBM approach over thebest recommender (the Item-Item algorithm) represents a + of the improvement of the best recommender overthe baseline (naive recommender). The improvement of SBM overSVD, relative to the improvement of SVD over the baseline, is + . These are major improvements, especially whencompared to the differences that could be attributed toimplementation details, which are small as shown by the differencein performance between SVD1 and SVD2 (Fig. 3). Characteristics of Sampled Partitions
As we have pointed out before, our approach offers theopportunity to study the collections of groupings that have themost explanatory power, namely, those that the Metropolissampler visits. The MovieLens dataset includes some demographicinformation about users (such as gender and age) as well as somecharacteristics of the movies (such as genre). We use thisinformation to assess whether the groupings we sample are indeedcorrelated with these user and movie characteristics, even whenthe stochastic block model does not take this information intoaccount.In particular, we study the co-classification of users [30], that is,the probability that two users belong to the same group p SBM ( s u ~ s u D R O ) ~ Z X PU [ P UPI [ P I d ( s u , s u ) e { H ( PU , PI ) : ð Þ We then plot the probability that a pair of users have the samegender and their average age difference as a function of their co-classification probability (Fig. 4A). We observe that user co-classification is strongly correlated with both demographicproperties. For example, a pair of users that are very unlikely tobelong to the same group have the same gender 58% of the times,whereas pairs of nodes that are almost surely in the same grouphave the same gender 83% of the times.Similarly, we plot the genre overlap (see Methods) between twomovies as a function of their co-classification probability (Fig. 4B).Again, we observe a strong correlation, which indicates that the
Figure 3. Algorithm comparison for real ratings from theMovieLens dataset. tochastic block model correctly picks groups that are related tomovie content, even without having access to such information.
Discussion
We have shown that a Bayesian approach based on the blockstructure of social networks gives predictions of human preferencesthat are significantly and considerably more accurate than leadingcollaborative filtering recommender algorithms.Like any other approach, ours has shortcomings. In particular,it is worth noting that the gain in accuracy comes at the expense ofcomputational cost–Metropolis sampling of the user and itempartition space is computationally demanding. Although we areable to run the algorithm on the MovieLens dataset withapproximately 1,000 users and items and 100,000 ratings,handling even one order of magnitude more might be challenging.Besides parallelizing the sampling process (which is straightfor-ward), we think that two approaches could significantly reduce thecomputational cost: (i) finding analytical approximations to Eq. (4),or even an exact series expansion in terms of the ratings matrix; (ii)implementing a believe propagation algorithm [23,31] to replaceMonte Carlo sampling.In any case, we consider that the advantages of our approachoutweigh its shortcomings. Not only does our algorithm provide better predictions, but also has some desirable features: it ismathematically rigorous, it is based on plausible social models, andit sheds light on decision-making processes.With respect to mathematical rigor, the Bayesian approach isthe complete and correct probabilistic treatment of the observa-tions. As a result, we obtain an estimate of the whole probabilitydistribution for each rating p ( r ui ~ r D R O ) . From this, we can choosehow to make predictions (the most likely rating, the mean, themedian, an others). In contrast, recommender systems like thosebased on matrix factorization give predictions that, in general, arenot feasible ratings (for example, r (cid:3) ui ~ : when r ui [ f g ) or thatmay even be outside the rating range (for example, r (cid:3) ui ~ : when r ui [ f g ). Additionally, these algorithms assume that ratings arelinearly spaced in the ‘‘psychological scale’’ of users (that is, thatthe difference between r ui ~ and r ui ~ is the same as between r ui ~ and r ui ~ ), which is known not to be true [4].Finally, our approach is based on models that were originallydefined and are widely used to explain how social agents establishrelationships, and is therefore in a better position to illuminatewhich social and psychological factors determine human prefer-ences. As an interesting byproduct of this, we note that it ispossible to use our approach to infer demographic properties fromratings alone, a subject that is of much current interest [32]. Methods
Derivation of the Rating Equations
Here, we show how we derive the expressions for the probabilityof a given rating (Eq. (4)) starting from the general Bayesianformulation of the problem (Eq. (2)). In a stochastic block model,users and items are partitioned into groups and the probabilitythat a user rates an item with r ui ~ r depends, exclusively, on thegroups s u and s i to which the user and the item belong, that is p ( r ui ~ r D M ) ~ q r ( s u , s i ) [ ½ (cid:2) , ð Þ with P r q r ( s u , s i ) ~ . Other than this normalization constraint, q r ( s u , s i ) can take any value between 0 and 1.As in the main text, we consider the case in which ratings cantake K different values r [ f . . . , K g . In this case, a model M ~ ( P U , P I , f Q , . . . , Q K g ) is completely specified by a partition P U of the users, a partition P I of the items, and K matrices Q r , r ~ . . . , K , whose elements are q r ( a , b ) . Then the likelihood of amodel is p ( R O D M ) ~ P a [ PU P b [ PI P Ki ~ q i ( a , b ) ni ab , ð Þ where n i ab is the number of i -ratings observed from users in group a to items in group b .Putting together Eqs. (2), (9) and (10), and under the assumptionof no prior knowledge about the models ( p ( M ) ~ c onst : ), we have p r ui ~ r j R O (cid:2) (cid:3) ~ Z X PU [ pUPI [ pI ð d Q q r s u , s i ð Þ P a [ PU P b [ PI P Ki ~ q i a , b ð Þ ni ab ; ð Þ where the integral is over all q i ( a , b ) within the subspace thatsatisfies the normalization constraints P i q i ( a , b ) ~ . Theseintegrals factorize and one is left with only two types of integralsto solve. For a ~ s u , b ~ s i and i ~ r we have (without loss of Figure 4. Characteristics of sampled partitions. enerality we consider the case r ~ and, for clarity, we drop thedependence of q i on a and b ) ð dq q n z ð { q dq q n (cid:4) (cid:4) (cid:4) ð { q { ... { qK { dq K { q nK { K { (1 { q { . . . { q K { ) nK ~ ( n z ! n ! . . . n K ! ( n z n z . . . z n K z K ) ! : For all other terms we have ð dq q n ð { q dq q n (cid:4) (cid:4) (cid:4) ð { q { ... { qK { dq K { q nK { K { (1 { q { . . . { q K { ) nK ~ n ! n ! . . . n K ! ( n z n z . . . z n K z K { ! : Using these expressions in Eq. (11), one obtains Eq. (4).
Sampling of the Partition Space
Uniformly sampling the space of users’ and movies’ partitions isnecessary to get accurate estimates of r ui (Eq. (4)). The simplestway to sample users (or movies) partitions is by considering arandom initial partition and then attempting moves of individualusers from their current group to a new group, which is selecteduniformly at random. However, this approach has the shortcom-ing of implicitly considering groups as distinguishable–for exam-ple, if node A is alone in group 1 and we move it to an emptygroup, the partition has not changed but the algorithm considers itas different.In fact, when there are as many potential user groups as thereare users, considering groups as distinguishable has the effect ofover-counting partitions by a factor ( N u { k u ) ! = N u ! , where N u isthe number of users and k u is the number of non-empty usergroups in the partition.Since, as we have said, sampling over partitions withdistinguishable groups is easiest to implement, in practice we usea modified Hamiltonian that ‘‘penalizes’’ partitions that areotherwise over-counted H ( P u , P m ) ~ H ( P u , P m ) { log ( N u { k u ) ! ½ (cid:2) { log ( N m { k m ) ! ½ (cid:2) , ð Þ where H ( P u , P m ) is given by Eq. (5).Note that the additional terms in Eq. (12) are not a prioripenalties to avoid over-fitting by models with many groups, butrather corrections to a sampling process that would otherwise bebiased. The over-fitting problem, which is common to other approaches to inference of block models [23], is automaticallysolved by our marginalization over the q r probabilities.For infinitely long samplings of the space of partitions, thecorrection in Eq. (12) exactly cancels the over-counting of certainpartitions that our sampling method causes. For finite samplingtimes, one cannot be sure that the whole partition space isuniformly sampled. To minimize this potential problem, we runshort, parallel and independent sampling processes in differentregions of the partition space, as opposed to a single long samplingprocess. This slightly improves our predictions of model and realratings (although the improvement is small compared to thedifference between our algorithm and other algorithms’ perfor-mance). Benchmark Algorithms
In the naive recommender (Naive), the rating of user u for item i is simply the average rating of i by all users: r Nui ~ P u ’ [ Ui r u ’ i D U i D , ð Þ where U i is the set of users that rated item i and D U i D is the numberof users in that set.The matrix factorization method based on singular valuedecomposition (SVD) works as follows [17]. The matrix of ratings R (with a number of rows n u that coincides with the number ofusers, and a number of columns n i that coincides with the numberof items) can be decomposed, using singular value decomposition,into R ~ P Q , ð Þ where P is a n u | n i matrix and Q is a n i | n i matrix.If we denote the rows of matrix P as p Tu and the columns of Q as q i , then individual ratings satisfy r ui ~ p Tu q i . For the purpose ofmaking recommendations, it is convenient to pose the decompo-sition problem as an optimization one; indeed, one can prove that P and Q are the solution of f p u , q i g ~ arg min ~ ppu , ~ qqi X u , i ( r ui { ~ pp Tu ~ qq i ) : ð Þ In practice, to estimate unobserverd ratings one needs to takeinto consideration a number of important issues. First, SVDfactorization can have a prohibitive computational cost becausewe typically deal with large n u and n i , so the problem has to bedimensionally reduced. Second, only some user-item pairs areobserved (namely, those ( u , i ) [ R O ). And third, users and items canhave rating biases (for example, some users rate items higher thanothers, and some items are systematically highly rated).Ultimately, unobserved ratings r (cid:3) ui are estimated using r (cid:3) ui ~ p (cid:3) Tu q (cid:3) i z m z b u z b i , ð Þ where b u and b i are the biases of users and items respectively and m is the average rating in R O . The vectors p (cid:3) u and q (cid:3) i are dimensionalreductions of the original p u and q i , and have length K v n u , n i .They are obtained by solving the optimization problem p (cid:3) u , q (cid:3) i g ~ arg min ~ ppu , ~ qqi X ( u , i ) [ RO r ui { ~ pp Tu ~ qq i { m { b u { b i (cid:2) (cid:3) z l X u , i E ~ pp u E z E ~ qq i E (cid:2) (cid:3) : ð Þ As Funks originally proposed [17] we solve this problemnumerically using the stochastic gradient descent algorithm [33].In our implementation of the algorithm (SVD1), we use K ~ .In the LensKit implementation of the algorithm (SVD2) we set K ~ and a learning rate of 0.002 as suggested in Ref. [4].Finally, the algorithm based on the similarity between items(Item-Item) works as follows [29]. One starts by defining asimilarity between items, which in our case is the cosine betweenthe item rating vectors (conveniently adjusted to remove userbiases towards higher or lower ratings [29]). The predicted rating r ui is the similarity-weighted average of the K closest neighbors of i that user u has rated. Once more, we use the default, optimizedimplementation of the algorithm in LensKit [4] ( K ~ ). Movie Genre Overlap
Each movie i in the MovieLens dataset is labeled with one ormore genres G i . We define the genre overlap o i , i between twomovies as the Jaccard index of the corresponding genre sets o i i ~ D G i \ G i DD G i | G i D , ð Þ that is, the ratio between the number of genres shared by the twomovies and the total number of genres with which they arelabelled. Author Contributions
Conceived and designed the experiments: RG AL EM MSP. Performedthe experiments: RG AL. Analyzed the data: RG AL EM MSP.Contributed reagents/materials/analysis tools: RG AL EM MSP. Wrotethe paper: RG AL EM MSP.
References
1. (2010) The data deluge. The Economist Feb. 25th.2. Adomavicius G, Tuzhilin A (2005) Towards the next generation of recommen-der systems: A survey of the state-of-the-art and possible extensions.IEEE T Knowl Data En 17: 734–749.3. Melville P, Sindhwani V (2010) Encyclopedia of Machine Learning, Springer-Verlag, chapter Recommender systems.4. Ekstrand MD, Ludwig M, Konstan JA, Riedl JT (2011) Rethinking therecommender research ecosystem: reproducibility, openness, and lenskit.Proceedings of the fifth ACM Conference on Recommender Systems : 133–140.5. Watts DJ (2007) A twenty-first century science. Nature 445: 489.6. Lazer D, Pentland A, Adamic L, Aral S, Baraba´si AL, et al. (2009)Computational social science. Science 323: 721–723.7. Brockmann D, Hufnagel L, Gaisel T (2006) The scaling laws of human travel.Nature 439: 462–465.8. Song C, Qu Z, Blumm N, Baraba´si AL (2010) Limits of predictability in humanmobility. Science 327: 1018–1021.9. Malmgren RD, Stouffer DB, Campanharo ASLO, Amaral LA (2009) Onuniversality in human correspondence activity. Science 325: 1696–1700.10. Salganik MJ, Dodds PS, Watts DJ (2006) Experimental study of inequality andunpredictability in an artificial cultural market. Science 311: 854–856.11. Christakis N, Fowler J (2007) The spread of obesity in a large social network over32 years. New Engl J Med 357: 370–379.12. Cohen-Cole E, Fletcher J (2008) Is obesity contagious? Social networks vs.environmental factors in the obesity epidemic. J Health Econ 27: 1382–1387.13. Fowler J, Christakis N (2008) Estimating peer effects on health in socialnetworks: A response to Cohen-Cole and Fletcher; and Trogdon, Nonnemaker,Pais. J Health Econ 27: 1400.14. Vespignani A (2009) Predicting the behavior of techno-social systems. Science325: 425–428.15. Iribarren JL, Moro E (2009) Impact of human activity patterns on the dynamicsof information diffusion. Phys Rev Lett 103: 038702.16. Su X, Khoshgoftaar TM (2009) A survey of collaborative filtering techniques.Adv in Artif Intell 2009: 4:2–4:2.17. Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques forrecommender systems. Computer 42: 30–37.18. Breiger RL, Boorman SA, Arabie P (1975) An algorithm for clustering relationaldata with applications to social network analysis and comparison withmultidimensional scaling. J Math Psychol 12: 328–383. 19. White HC, Boorman SA, Breiger RL (1976) Social structure from multiplenetworks. i. blockmodels of roles and positions. Am J Sociol 81: 730–780.20. Holland PW, Laskey KB, Leinhardt S (1983) Stochastic blockmodels: First steps.Soc Networks 5: 109–137.21. Nowicki K, Snijders TAB (2001) Estimation and prediction for stochasticblockstructures. J Am Stat Assoc 96: 1077–1087.22. Guimera` R, Sales-Pardo M (2009) Missing and spurious interactions and thereconstruction of complex networks. Proc Natl Acad Sci U S A 106: 22073–22078.23. Decelle A, Krzakala F, Moore C, Zdeborova´ L (2011) Inference and phasetransitions in the detection of modules in sparse networks. Phys Rev Lett 107:065701.24. Newman MEJ (2011) Communities, modules and large-scale structure innetworks. Nat Phys 8: 25–31.25. Jaynes ET (2003) Probability Theory: The Logic of Science. CambridgeUniversity Press.26. Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian modelaveraging: A tutorial. Statistical Science 14: 382417.27. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953)Equation of state calculations by fast computing machines. J Chem Phys 21:1087–1092.28. Paterek A (2007) Improving regularized singular value decomposition forcollaborative filtering. In: Proc. KDD Cup Workshop at SIGKDD’07, 13thACM Int. Conf. on Knowledge Discovery and Data Mining. 39–42.29. Sarwar B, Karypis G, Konstan J, Riedl J (2001) Item-based collaborativefiltering recommendation algorithms. In: Proceedings of the 10th internationalconference on World Wide Web. New York, NY, USA: ACM, WWW ’01, 285–295. doi:10.1145/371920.372071.30. Sales-Pardo M, Guimera` R, Moreira AA, Amaral LAN (2007) Extracting thehierarchical organization of complex systems. Proc Natl Acad Sci USA 104:15224–15229.31. Yedidia JS, Freeman WT, Weiss Y (2003) Exploring artificial intelligence in thenew millennium. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,chapter Understanding belief prop-agation and its generalizations. 239–269.32. Malmgren RD, Hofman JM, Amaral LAN, Watts DJ (2009) Characterizingindividual communication patterns. In: Proceedings of the 15th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining. 607–615.33. Gardner W (2003) Learning characteristics of stochastic-gradient-descentalgorithms: A general study, analysis, and critique. Signal Process 6: 113–133.1. (2010) The data deluge. The Economist Feb. 25th.2. Adomavicius G, Tuzhilin A (2005) Towards the next generation of recommen-der systems: A survey of the state-of-the-art and possible extensions.IEEE T Knowl Data En 17: 734–749.3. Melville P, Sindhwani V (2010) Encyclopedia of Machine Learning, Springer-Verlag, chapter Recommender systems.4. Ekstrand MD, Ludwig M, Konstan JA, Riedl JT (2011) Rethinking therecommender research ecosystem: reproducibility, openness, and lenskit.Proceedings of the fifth ACM Conference on Recommender Systems : 133–140.5. Watts DJ (2007) A twenty-first century science. Nature 445: 489.6. Lazer D, Pentland A, Adamic L, Aral S, Baraba´si AL, et al. (2009)Computational social science. Science 323: 721–723.7. Brockmann D, Hufnagel L, Gaisel T (2006) The scaling laws of human travel.Nature 439: 462–465.8. Song C, Qu Z, Blumm N, Baraba´si AL (2010) Limits of predictability in humanmobility. Science 327: 1018–1021.9. Malmgren RD, Stouffer DB, Campanharo ASLO, Amaral LA (2009) Onuniversality in human correspondence activity. Science 325: 1696–1700.10. Salganik MJ, Dodds PS, Watts DJ (2006) Experimental study of inequality andunpredictability in an artificial cultural market. Science 311: 854–856.11. Christakis N, Fowler J (2007) The spread of obesity in a large social network over32 years. New Engl J Med 357: 370–379.12. Cohen-Cole E, Fletcher J (2008) Is obesity contagious? Social networks vs.environmental factors in the obesity epidemic. J Health Econ 27: 1382–1387.13. Fowler J, Christakis N (2008) Estimating peer effects on health in socialnetworks: A response to Cohen-Cole and Fletcher; and Trogdon, Nonnemaker,Pais. J Health Econ 27: 1400.14. Vespignani A (2009) Predicting the behavior of techno-social systems. Science325: 425–428.15. Iribarren JL, Moro E (2009) Impact of human activity patterns on the dynamicsof information diffusion. Phys Rev Lett 103: 038702.16. Su X, Khoshgoftaar TM (2009) A survey of collaborative filtering techniques.Adv in Artif Intell 2009: 4:2–4:2.17. Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques forrecommender systems. Computer 42: 30–37.18. Breiger RL, Boorman SA, Arabie P (1975) An algorithm for clustering relationaldata with applications to social network analysis and comparison withmultidimensional scaling. J Math Psychol 12: 328–383. 19. White HC, Boorman SA, Breiger RL (1976) Social structure from multiplenetworks. i. blockmodels of roles and positions. Am J Sociol 81: 730–780.20. Holland PW, Laskey KB, Leinhardt S (1983) Stochastic blockmodels: First steps.Soc Networks 5: 109–137.21. Nowicki K, Snijders TAB (2001) Estimation and prediction for stochasticblockstructures. J Am Stat Assoc 96: 1077–1087.22. Guimera` R, Sales-Pardo M (2009) Missing and spurious interactions and thereconstruction of complex networks. Proc Natl Acad Sci U S A 106: 22073–22078.23. Decelle A, Krzakala F, Moore C, Zdeborova´ L (2011) Inference and phasetransitions in the detection of modules in sparse networks. Phys Rev Lett 107:065701.24. Newman MEJ (2011) Communities, modules and large-scale structure innetworks. Nat Phys 8: 25–31.25. Jaynes ET (2003) Probability Theory: The Logic of Science. CambridgeUniversity Press.26. Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian modelaveraging: A tutorial. Statistical Science 14: 382417.27. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953)Equation of state calculations by fast computing machines. J Chem Phys 21:1087–1092.28. Paterek A (2007) Improving regularized singular value decomposition forcollaborative filtering. In: Proc. KDD Cup Workshop at SIGKDD’07, 13thACM Int. Conf. on Knowledge Discovery and Data Mining. 39–42.29. Sarwar B, Karypis G, Konstan J, Riedl J (2001) Item-based collaborativefiltering recommendation algorithms. In: Proceedings of the 10th internationalconference on World Wide Web. New York, NY, USA: ACM, WWW ’01, 285–295. doi:10.1145/371920.372071.30. Sales-Pardo M, Guimera` R, Moreira AA, Amaral LAN (2007) Extracting thehierarchical organization of complex systems. Proc Natl Acad Sci USA 104:15224–15229.31. Yedidia JS, Freeman WT, Weiss Y (2003) Exploring artificial intelligence in thenew millennium. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,chapter Understanding belief prop-agation and its generalizations. 239–269.32. Malmgren RD, Hofman JM, Amaral LAN, Watts DJ (2009) Characterizingindividual communication patterns. In: Proceedings of the 15th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining. 607–615.33. Gardner W (2003) Learning characteristics of stochastic-gradient-descentalgorithms: A general study, analysis, and critique. Signal Process 6: 113–133.