Generating Artificial Core Users for Interpretable Condensed Data
GG ENERATING A RTIFICIAL C ORE U SERS FOR I NTERPRETABLE C ONDENSED D ATA
A P
REPRINT
Amy Nesky
Computer Science and EngineeringUniversity of Michigan, Ann ArborAnn Arbor, MI 48109 [email protected]
Quentin F. Stout
Computer Science and EngineeringUniversity of Michigan, Ann ArborAnn Arbor, MI 48109 [email protected]
February 9, 2021 A BSTRACT
Recent work has shown that in a dataset of user ratings on items there exists a group of Core Userswho hold most of the information necessary for recommendation. This set of Core Users can be assmall as 20 percent of the users. Core Users can be used to make predictions for out-of-sample userswithout much additional work. Since Core Users substantially shrink a ratings dataset without muchloss of information, they can be used to improve recommendation efficiency. We propose a method,combining latent factor models, ensemble boosting and K-means clustering, to generate a small set ofArtificial Core Users (ACUs) from real Core User data. Our ACUs have dense rating information,and improve the recommendation performance of real Core Users while remaining interpretable.
Keywords
Recommender Systems · Core Users · Synthetic Data Generation
Recommendations Systems improve the user experience by providing personalized, curated recommendations in alarge and complex information space. Collaborative Filtering methods exploit community data to uncover correlationsbetween users and items that can be used to make recommendations. Within collaborative filtering, latent factor modelsare considered state-of-the-art; these models approximate highly redundant data with low rank matrix decompositions(Hu et al., 2008; Koren, 2008; Koren et al., 2009; Shi et al., 2014; Aggarwal, 2016).Unfortunately, many matrix factorizations methods are transductive and cannot be leveraged to make predictions forusers outside of the original training set (Aggarwal, 2016). Orthonormal matrix factors can be used more easily to makeout-of-sample predictions because the matrix factors capture geometric traits of the item and user spaces. However,obtaining orthonormal factors can be expensive for large datasets.The majority of research in recommender systems focuses on developing new collaborative filtering and hybrid methods,which are often highly optimized for a specific application. There is a smaller body of research looking at identifyingthe most useful users who carry most of the relevant information, and separating out those users as Core Users formaking recommendations. With a smaller core set of users, making out-of-sample predictions becomes a reasonabletask. Reducing a dataset size without much loss of information improves recommendation efficiency as well as storagecosts; numerous fields outside of recommender systems would benefit from this ability.However, available Core User methods have limited representation ability, in that they are selected from existinguser data. In other words, the recommendation success of Core Users is bounded by quality of user data available.Improving the recommendation accuracy of Core Users makes data abstraction more effective for applications in dataaugmentation, bots mimicking population behavior, data mining, privacy, statistics and many more. In this paper, wedevelop a method of generating Artificial Core Users (ACUs) that improves the recommendation accuracy of real CoreUsers. We combine latent factor models, ensemble boosting and K-means clustering, to generate a small set of Artificial a r X i v : . [ c s . I R ] F e b enerating Artificial Core Users for Interpretable Condensed Data A P
REPRINT
Core Users (ACUs) from real Core User data. Our ACUs incur a small amount of additional memory storage whencompared to real Core Users, but remain a reduction in memory storage compared to the original dataset. ArtificialCore Users improve the recommendation accuracy of real Core Users while remaining good centroids for the completerecommendation dataset. Since ACUs act as good centroids for the complete dataset, ACUs blend in well with the realdataset even though they are generated artificially. But unlike real Core Users, ACUs have complete ratings on all items,providing more immediately interpretable information to scientists.
The inductive matrix completion problem assumes that a ratings matrix is generated by applying feature vectors toa known low-rank matrix (Jain and Dhillon, 2013; Zhang et al., 2018). This is relevant for making out-of-samplerecommendations if we assume that the latent factors contain some ground truth about the dataset that applies toout-of-sample users. As mentioned above, latent factors produced by most methods are sadly transductive (Aggarwal,2016). To combat this, some have worked on improving the efficiency of using a singular value decomposition inrecommender systems (Sarwar† et al., 2002).Using clustering is one of the earliest attempts to decrease the number of users needed to make recommendations(Aggarwal, 2016); rating predictions can be made using only information from the relevant cluster. Alternatively, Zenget al. (2014) study the relevance of different users and find that there exists an “information core" made up of some keyusers. They found that the number of the Core Users is around of the entire dataset, and that the recommendationaccuracy produced by only relying on the Core Users can reach 90 percent of that produced using every user in thedataset. Zeng et al. (2014) use a generalized K-nearest Neighbor algorithm using various relevancy metrics to measure‘nearness’. They ran experiments using degree-based, resource-based and similarity-based measures, to select theirCore Users; all of these measures are graphical in nature. Since this work a few other methods for selecting Core Usershave emerged. Li et al. (2016) use a long-tail-distribution-based measure to select their core uses. Cao and Kuang(2016) introduced a new measure to identify Core Users based on trust relationships and interest similarity; this workextends beyond graphical knowledge to include the semantic meaning of items. Kuang et al. (2017) use a combinationof the measures proposed in (Cao and Kuang, 2016) and (Li et al., 2016).Recently, deep learning has made an appearance in recommender systems either in the form of integration models orneural network models (Zhang et al., 2017). Integration models use neural networks to uncover features in auxiliaryinformation, like item descriptions (Wang et al., 2015, 2016), user profiles (Li et al., 2015) and knowledge bases (Zhanget al., 2016). The uncovered features are then incorporated into a collaborative filtering framework to produce hybridrecommendations. Neural network models on the other hand perform collaborative filtering directly via modeling theinteraction function between users and items (Sedhain et al., 2015; Strub et al., 2016; Wu et al., 2016; He et al., 2017;Li et al., 2018). Using deep learning to make recommendations means the models are better equipped to recognizenonlinear relationships in the data, but it also means that the models inherit all the training difficulties neural networksface compounded by the difficulties of working with sparse data.There is a small body of work that injects fake users into a recommendation system either for adversarial goals(OMahony et al., 2002; Lam and Riedl, 2004; Christakopoulou and Banerjee, 2018) or utilitarian data augmentationgoals (Sarwar et al., 1998). Typically, the fake users are hand-coded, but (Christakopoulou and Banerjee, 2018) createadversarial fake user profiles for a recommendation system using generative adversarial nets. Sarwar et al. (1998) on theother hand used simple content filtering bots to generate a few users with dense ratings to improve the recommendations.Tackling the new user, or cold start, problem is an issue in recommender systems that has brought about some work ondetermining which item preferences and item features are most informative (Rashid et al., 2002, 2008; Seroussi et al.,2011).
We propose an offline technique to generate a small set of Artificial Core Users (ACUs) who’s dense ratings matrixcan be stored in condensed, low order form and used to make recommendations for out-of-sample users without muchadditional work. This method uses latent factor models, ensemble boosting and K-means clustering to learn a smallgroup of artificial users who’s feature vectors capture correlations in the full dataset. This is a collaborative filteringmethod that uses an existing sparse ratings matrix for a set of users and items.The algorithm requires a set of training users represented by their ratings on a given set of items, which will be used toupdate ACU ratings during learning. To measure the abstraction of the ACUs, we split the dataset into two mutuallyexclusive sets: testing users and training users. Let R be an m × n sparse ratings matrix for m training users and n A P
REPRINT items. The ACU set size will be small relative to the size of the dataset; if s is the number of ACUs, we pick s suchthat s (cid:28) m . Let R ACU denote the s × n ratings matrix for our ACUs. There are a number of ways one could initializethe ACU ratings; in our work, ACU ratings will be initialized from Core User ratings. The training algorithm is given in Algorithm 1. We trained row blocks of R ACU together using batches of training users.By learning ACUs in blocks, we incorporate boosting results and reduce our workload (Schapire, 1990; Breiman, 1998).Algorithm 1 learns orthonormal decompositions for each block, leveraging the relevance of orthonormal decompositionsto out-of-sample relationships. Let R ( b ) ACU denote the b th row block in R ACU , and R i denote the sparse ratings matrixof the i th batch of training users. For simplicity, we assume that each block of R ACU has the same number of rows andsimilarly each training batch has the same number of users. We divide R ACU into ζ equal sized row blocks such thatfor b ∈ { , . . . , ζ } R ( b ) ACU is a (cid:16) sζ × n (cid:17) matrix. Let I denote the training batch index set such that for i ∈ I , R i is a (cid:16) m | I | × n (cid:17) matrix. The variables α and β are learning rates, while the variables λ and γ are regularization coefficients.Lines 15 and 20 in Algorithm 1 are derived from stochastic gradient descent algorithms. Line 18 updates the rows of U ACU as the sζ -means of m | I | row vectors in U i , which is described in detail in Algorithm 3. Embedding Algorithm3 inside Algorithm 1 effectively trains the Artificial Core User user-space matrix, U ACU , with Mini-Batch K-means(Sculley, 2010). By incorporating Algorithm 3 into the learning process we expect to keep the resulting Artificial CoreUser ratings matrix interpretable. That is, we expect the resulting Artificial Core User ratings to resemble that of realusers, but provide complete rating information in place of sparse data. As with any learning method, the amount ofregularization will be problem dependent; one may find that they can get away with λ, γ = 0 because lines 16 and23-24 in Algorithm 1 have a regularizing effect. Lines 23-24 ensure that we don’t stray to far from an orthonormaldecomposition during learning. The while loop on line 10 takes a simplistic approach to the inductive matrix completionproblem given a set of features; this subprocess could likely be improved by existing work Jain and Dhillon (2013);Zhang et al. (2018).Algorithm 1 should be run offline to produce a set of Artificial Core Users. After which, one can use V ACU S ACU tomake faster recommendations for out-of-sample users. One may also be interested in using R ACU as a condensedversion of their dataset.This process resembles the way one would train a neural network and shares similarities with neural network models(Sedhain et al., 2015; Strub et al., 2016; Wu et al., 2016; He et al., 2017; Li et al., 2018). Neural network modelsgenerally use autoencoders. One layer Neural Collaborative Autoencoders with no output activation can be reformulatedto parallel matrix factorization (Li et al., 2018). Unlike Algorithm 1, existing these models are not concerned withinterpretability.We evaluate our work using two metrics: item vectors testing error and sparse K-means error. To measure the itemvectors testing error, we use Algorithm 2 on an independent set of testing users. Algorithm 2 is a simplistic recommendermodel that tries to predict missing user ratings. It decomposes the row blocks of R ACU into orthogonal componentsusing a singular value decomposition. Then, for a set of real testing users, Algorithm 2 optimizes a set of unit vectors toaccompany the item vectors extracted from the decomposition of R ACU - there is no new learning done in the itemvectors. In this way, Algorithm 2 evaluates the generality of the item vectors extracted from R ACU and the potentialfor high quality recommendations using a more elaborate learning model. It should be noted that, as a simplisticrecommender model, Algorithm 2 produces far from state-of-the-art recommendation results, but is useful for ourpurposes of evaluating whether Artificial Core Users better capture the information in a recommendation dataset thanreal Core Users do. To compute the sparse K-means error of R ACU we make a few modifications to Algorithm 3using a sparse first input matrix; this results in Algorithm 4. Algorithm 4 estimates how well the Artificial Core Usersrepresent an average collection of real users, R ACU , by measuring how well the ACUs serve as centroids for our realtesting users. We did choose to normalize the variance and mean center the rows of R before proceeding; any adjustments made here can beaccounted for at recommendation time. As with any learning process, one can see a lot of improvement by selecting the right initialization. There is room to trysupplementing Core User ratings to improve the initialization of R ACU . Computing the singular value decomposition of a large matrix is known to be time consuming, so learning in blocks saves quitea bit of time.
A P
REPRINT
Algorithm 1
Generate_ACUs( R ) Let m be the number of users (row dimension of R ) and n be the items (column dimension of R ). Initialize constants α , λ , β , γ . Initialize matrix R ACU with dimensions s × n . while not done do for b = 0 to ζ do Find orthonomal (cid:16) sζ × k (cid:17) matrix U ( b ) ACU , orthonomal ( n × k ) matrix V ( b ) ACU and diagonal ( k × k ) matrix S ( b ) ACU such that U ( b ) ACU S ( b ) ACU (cid:16) V ( b ) ACU (cid:17) (cid:62) ≈ R ( b ) ACU and k ≤ min (cid:16) sζ , n (cid:17) . Set V ( b ) ACU = V ( b ) ACU S ( b ) ACU . Select i randomly from training user batch index set I . Initialize U i . j = 0 . while not done do Compute sparse E = R w e − U w e (cid:16) V ( b ) ACU (cid:17) (cid:62) for the specified entries of R i . Other entries of E remainzero. if not done then if j mod 2 = 0 then U w e ← (1 − βγ ) U w e + βEV ( b ) ACU . Normalize the columns of U i . else if j mod 4 = 1 then Means_Update (cid:16) U i , U ( b ) ACU (cid:17) . else V ( b ) ACU ← (1 − αλ ) V ( b ) ACU + αE (cid:62) U i . end if end if R ( b ) ACU = U ( b ) ACU (cid:16) V ( b ) ACU (cid:17) (cid:62) . Find orthonomal matricies U ( b ) ACU , V ( b ) ACU and diagonal matrix S ( b ) ACU such that U ( b ) ACU S ( b ) ACU (cid:16) V ( b ) ACU (cid:17) (cid:62) ≈ R ( b ) ACU . Set V ( b ) ACU = V ( b ) ACU S ( b ) ACU . j = j + 1 . end if end while end for end while A P
REPRINT
Algorithm 2
Item_Vectors_Testing_Error(
R, R
ACU , ζ ) Let m be the number of users (row dimension of R ), n be the items (column dimension of R and R ACU ) and s bethe number of ACUs (row dimension of R ACU ). Initialize constants β , γ . for b = 0 to ζ do Find orthonomal (cid:16) sζ × k (cid:17) matrix U ( b ) ACU , orthonomal ( n × k ) matrix V ( b ) ACU and diagonal ( k × k ) matrix S ( b ) ACU such that U ( b ) ACU S ( b ) ACU (cid:16) V ( b ) ACU (cid:17) (cid:62) ≈ R ( b ) ACU and k ≤ min (cid:16) sζ , n (cid:17) . end for Let V ACU = (cid:104) V (0) ACU · · · V ( ζ ) ACU (cid:105) . Aggregate S ( b ) ACU for b = 0 to ζ along diagonal blocks to form the diagonal matrix S ACU . Set V ( b ) ACU = V ( b ) ACU S ( b ) ACU . Randomly select of the nonzero entries in R as training entries and let the other be the probe entries. Define R ( T ) as the sparse matrix made up of the training entries of R , and define R ( P ) as the sparse matrix madeup of the probe entries of R such that R = R ( T ) + R ( P ) . Initialize U . while not done do Compute sparse E = R ( T ) − U V (cid:62)
ACU for the specified entries of R ( T ) . Other entries of E remain zero. if not done then U ← (1 − βγ ) U + βEV ACU . Normalize the columns of U . end if end while Compute sparse E = R ( P ) − U V (cid:62)
ACU for the specified entries of R ( P ) . Return the average absolute value of an entry in the E for the specified entries of R ( P ) . Algorithm 3
Means_Update(
A, B ) Initialize lists L i for we = 0 to the row dimension of B . for r = 0 to the row dimension of A do min_val = ∞ , min_index = 0 . for l = 0 to the row dimension of B do if The distance between the r th row of A and the l th row of B is less than min_val then min_val = this distance. min_index = l . end if end for L min_index .push ( r ) . end for for l = 0 to the row dimension of B do if list L l is non-empty then Set the l th row of B to the weighted average of the rows of A with indices stored in list L l . end if end for A P
REPRINT
Algorithm 4
Sparse_Means_Error(
R, R
ACU ) avg_error = 0. entry_count = 0. for r = 0 to the row dimension of R do min_val = ∞ , min_index = 0 . for l = 0 to the row dimension of R ACU do if The sparse distance between the r th row of R and the l th row of R ACU is less than min_val for specifiedentries of R then min_val = this distance. min_index = l . end if end for for sparse entries in the r th row of R do entry_count += 1. temp_err = absolute difference between entry of the r th row of R and the corresponding entry in themin_index row of R ACU . avg_error += (temp_err - avg_error) / entry_count. end for end for Return avg_error.
Our main objective in our experiments is comparability across methods rather than state-of-the-art performance. Tomake the error on the probe entry sets comparable when testing with Algorithm 2, we aimed for similar errors on thetraining set; to accomplish this goal we stopped the algorithm early when necessary.All experiments are run on the Bridges’ NVIDIA P100 GPUs through the Pittsburgh Supercomputing Center. We ranexperiments using the MovieLens ml-20m dataset (Harper and Konstan, 2015). We normalized the variance and meancentered the rows of the dataset ratings matrix before proceeding. We will compare our work for generating ACUs to existing methods for selecting Core Users from (Zeng et al., 2014;Li et al., 2016; Cao and Kuang, 2016; Kuang et al., 2017). All of the methods for finding Core Users tested here arederived from the K-nearest neighbor algorithm. These methods use some metric to determine the pair-wise similaritybetween all existing users. Then, for each user, a list of the top-K most similar users is generated; from these combinedlists the Core Users are selected. In our Core User experiments, we collect the top-50 most similar users for each user.Existing methods differ in their pair-wise similarity metric for users, and in their selection method within the compiledtop-K most similar user lists.Recall that the cosine similarity of two vectors A and B is ( A · B ) / ( || A || · || B || ) . In all of our Core User experiments,the pair-wise similarity between all existing users will be calculated in one of two ways: it will either be the cosinesimilarity of the users’ ratings vectors, or it will be the cosine similarity of their boolean vectors where their booleanvectors indicate only whether or not an item has been rated - not how well the user liked it.Once the lists of the top-K most similar users to each user have been generated, one can either count the frequency ofa given user’s appearance in the top-K lists and take the most frequent users as the Core Users, or one can weight auser’s appearance in a list by the inverse of the rank within the list that user appeared; the first way we will refer to asfrequency-based and the second way we will refer to as rank-based.The final variant is whether or not to consider ‘hidden’ ratings in the cosine similarity of users. Without consideringhidden ratings, two users with no overlapping rated items would have zero similarity, but if the users have rated itemsthat are similar to one another this metric seems insufficient. Cao and Kuang (2016); Kuang et al. (2017) considersemantic relationships between items to determine item similarity, here we will use the cosine similarity of the itemvectors where the item vectors are the rating data from all of the users for the given item. After computing the itemsimilarity, a missing rating may be substituted for with a weighted average of similar rated items, where the itemsimilarity can be used as the weight. Any adjustments made here can be accounted for at recommendation time. For each item, a list of the top-K most similar items is generated.
A P
REPRINT
The MovieLens ml-20m contains ratings for 138493 users on a set of 27278 movies (Harper and Konstan, 2015). Thissection will discuss the performance of ACUs compared to existing Core User methods using the MovieLens ml-20mdataset.Table 1: Item Vectors Testing Error of 13000 Core Users collected with previously existing methods using the ml-20mdataset (Harper and Konstan, 2015). We averaged the results of Algorithm 2 over 75 runs where each run was given200 independent testing users and 2600 of the Core User Item Vectors, or ≈ of the singular value mass. In eachrun, we stopped Algorithm 2 when we reached a training error of 0.2. Item Ratings Frequency- ProbeSimilarity Used Based EntryUsed Set Error yes yes yes 1.41yes yes no yes no yes 1.71yes no no 1.67no yes yes 1.45no yes no 1.39no no yes 1.73no no no 1.70Table 1 shows the performance of the various previously existing Core User selection methods when tested using theitem vectors testing error; each method was used to collect 13000 Core Users or a little under of the originalMovieLens ml-20m dataset size. This is half the number of Core Users necessary to maintain recommendation accuracyas claimed in (Zeng et al., 2014), but our aim is comparing the viability of these methods. To test each method weused 13 row blocks, or 1000 users per block. We found that the performance is sensitive to the number of item vectorsretained as latent factors in the line 4 of Algorithm 2; we’ll discuss reasons for this a bit further down. We chose touse of the singular values, for a total of latent factors across all the blocks. To make the error on the probeentry sets comparable across methods, we stopped running Algorithm 2 when the error on the training set had reached . . The average absolute value of an entry in the ratings matrix, after centering, is ≈ . ; in other words, the errorwhen always predicting that a missing rating will be the mean, zero, is ≈ . . Therefore, we ran Algorithm 2 forvarious Core User methods until the error on the training set was ≈ better than simply always guessing the mean.The first column in Table 1 indicates whether or not hidden ratings taken from the item similarities were incorporatedinto the calculation of the user similarities. The second column indicates whether the cosine similarity of the uservectors is taken using the actual item ratings or just the item booleans. The third column indicates whether we used afrequency-based or a rank-based approach to select the Core Users from the aggregated lists of the top-50 most similarusers to each other user. These results support previous literature suggesting that the best method for selecting CoreUsers considers item similarity, compares ratings rather than booleans, and uses a rank-based selection approach.For all methods used in Table 1, Algorithm 4 returns an error of ≈ . ; this error is better than the error whenusing only one centroid with the mean at the origin.Figures 1 and 2 help to explain why the performance of Algorithm 2 is sensitive to the number of item vectors retainedas latent factors in the line 4. They compare the singular values of the 13000 Core User ratings matrix where theCore Users are selected using the most competitive selection method: the method used in the second row of Table 1to the singular values of an equivalently sparse matrix with random ratings as the non-zero values. Singular valueslay out the behavior of a matrix as a mapping between spaces, and the Marchenko-Pastur theorem (Marchenko andPastur, 1967), describes the asymptotic behavior of the singular values of large random matrices. Figures 1 and 2show that the singular values of the Core User ratings matrix only significantly differ from those of a random matrixin the lowest order singular values, which are, by convention, the largest singular values. In fact, beyond the first of singular values, the singular values of the Core User ratings matrix differ by at most 3.6 from those of a randommatrix, with larger order singular values contributing less influence over the behavior of the matrix. So, while the largerorder singular values of the Core User ratings matrix are non-zero, which is generally how we measure relevance, thisrelationship suggests that the larger order singular vectors may not be informative and may actually make learning moredifficult by adding noise. 7enerating Artificial Core Users for Interpretable Condensed Data A P
REPRINT -2 -1 Order Percent S i ngu l a r V a l u e Singular Values of Core User Ratings Mtx and Singular Values of Mtx with Random Ratings
Core Usersrandom
Figure 1: Singular Values of 13000 Core User RatingsMatrix where Core Users are selected from the ml-20mdataset (Harper and Konstan, 2015) using the most com-petitive selection method: the method used in the secondrow of Table 1. Singular Values of an equally sparse ma-trix with random ratings as the non-zero values. There are13000 singular values, where by convention lower ordersingular values have larger value. The x-axis is labeledas the order percent, so the i th singular value would have x -tick value i/ . -2 -1 Order Percent D i ff e r e n ce Absolute Value of DifferenceBetween Core User Singular Valuesand Random Singular Values
Figure 2: Absolute Value of the difference between thecurves in Figure 1.We now move to discussing the results of learning ACUs using Algorithm 1. We initialized our ACUs are as Core Usersselected with the most competitive selection method: the method used in the second row of Table 1.Figures 3 shows the item vectors testing error of 13000 ACUs over iterations, where an by an iteration of Algorithm 1we are referring to one loop beginning on line 3. For each test, we averaged the results of Algorithm 2 over 20 runswhere each run was given 200 independent testing users and either 2600 or 650 ACU item vectors. As in our Core Usertests, in each run we stopped Algorithm 2 when we reached a training error of 0.2. In both testing and training we used13 row blocks, or 1000 ACUs per block. Whether we use 2600 or 650 ACU item vectors we can see clear improvementcompared to the real Core User item vectors testing error, which is simply the y − intercept of these graphs. The bestitem vectors testing error using 650, or of the 13000 ACU item vectors is better than the item vectors testing errorof 27000 Core Users ( of the users in the complete dataset) when using using 650 Core User item vectors. The bestitem vectors testing error using 650 of the 13000 ACU item vectors is 0.987, while the item vectors testing error of27000 Core Users when using using 650 Core User item vectors is 1.03.Since the Core Users are stored in the same memory format as the original complete dataset, retaining only of theusers as Core Users results in approximately a memory reduction. The memory reduction of the Artificial CoreUsers depends on the number of latent factors one decides to store. If one chooses to store only of the resultinglatent factors, both user and item vectors, then this results in a reduction in memory compared to the original dataset. If one chooses to store only of the the item vectors then this results in a reduction in memory compared tothe original data set. Additionally, the improvement in the item vectors testing error when using only of the the itemvectors in Algorithm 2 compared to using of the the item vectors as shown in Figures 3 suggests that these extravector may be more noisy than informative. 8enerating Artificial Core Users for Interpretable Condensed Data A P
REPRINT
Iterations T e s ti ng E rr o r Item Vectors Testing Error on ACUsusing 20% of the Item Vectors
Iterations T e s ti ng E rr o r Item Vectors Testing Error on ACUsusing 5% of the Item Vectors
Figure 3: Item Vectors Testing Error of 13000 ACUs over iterations using the ml-20m dataset (Harper and Konstan,2015). For each test, we averaged the results of Algorithm 2 over 20 runs where each run was given 200 independenttesting users and (Left) 2600/(Right) 650 ACU item vectors. In each run, we stopped Algorithm 2 when we reached atraining error of 0.2.Admittedly, Algorithm 1 learns relatively slowly. One loop beginning on line 3 can take up to 4 minutes to complete.We stopped running Algorithm 1 after 180 iterations at which point we reached an item vectors testing error of 1.36with 2600 ACU item vectors and 0.99 with 650 ACU item vectors, which outperforms the most competitive real CoreUser methods. We also improved the sparse mean error slightly , so our ACUs are a bit better centroids for the testingusers than 13000 real Core Users are. Figure 4 shows the sparse mean error of our ACUs over iterations calculated withAlgorithm 4. The second improvement is marginal, but demonstrates that our ACUs are still resemble an average groupreal users.
Iterations S p a r s e M ea n s E rr o r on A C U s Figure 4: Sparse Mean Error of 13000 ACUs using the ml-20m dataset (Harper and Konstan, 2015).We were able to condense the information of 27000 sparse Core Users into 13000 ACUs or approximately half thenumber of users. Since ACUs carry dense information, storing of the 13000 ACUs latent factors takes up aboutthree times as much memory as 27000 sparse Core Users do, but of the 13000 ACUs latent factors achieves asmaller item vectors testing error than the same number of Core User latent vectors when using 27000 sparse CoreUsers. We can see from Figure 5 that the largest singular values of the ACU ratings matrix, that had matched the largestsingular values of the ratings matrix with 13000 Core Users at initialization, has shifted toward the the largest singularvalues of the ratings matrix with 27000 Core Users after 180 training iterations.9enerating Artificial Core Users for Interpretable Condensed Data A P
REPRINT
Figure 5: 650 largest singular values of 13000 trained ACU ratings matrix using the ml-20m dataset (Harper andKonstan, 2015), compared to the largest singular values of the ratings matrix of 27000 Core Users ( of the users inthe complete dataset) and the largest singular values of the ratings matrix of 13000 Core Users ( of the users in thecomplete dataset).
We have shown that our Artificial Core Users improve the recommendation accuracy of real Core Users while mimickingreal user data. Because they act as good centroids for the complete dataset, they can be considered good representativesfor real user clusters. They can be stored efficiently, yet they have dense ratings information which is more immediatelyinterpretable than sparse data. We have removed the representation limits of Core Users and shown that an iterativetraining process can improve the recommendation accuracy of Core Users producing data that continues to resemblethat of real users, conserve memory and improve recommendation efficiency.
References
Charu C. Aggarwal.
Recommender Systems . Springer International Publishing Switzerland, 2016.Leo Breiman. Arcing classifier (with discussion and a rejoinder by the author).
Ann. Stat. , 26:801–849, 1998.Gaofeng Cao and Li Kuang. Identifying core users based on trust relationships and interest similarity in recommendersystem.
IEEE International Conference on Web Services , 2016.Konstantina Christakopoulou and Arindam Banerjee. Adversarial recommendation: Attack of the learned fake users. arXiv , 2018. URL arXiv:1809.08336 .F. Maxwell Harper and Joseph A. Konstan. The movielens datasets: History and context.
ACM Transactions onInteractive Intelligent Systems (TiiS) 5 , 4, 2015.Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. Neural collaborative filtering.
WWW , 2017.Y. Hu, Y. Koren, and C. Volinsky. Collaborative filtering for implicit feedback datasets.
ICDM , 2008.Prateek Jain and Inderjit S. Dhillon. Provable inductive matrix completion. arXiv , 2013. URL arXiv:1306.0626 .Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model.
KDD , 2008.Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems.
Computer , 2009.Li Kuang, Gaofeng Cao, and Liang Chen. Extracting core users based on features of users and their relationships inrecommender systems.
International Journal of Web Services Research , 14, 2017.Shyong K Lam and John Riedl. Shilling recommender systems for fun and profit.
WWW , pages 393–402, 2004.Qibing Li, Xiaolin Zheng, and Xinyue Wu. Neural collaborative autoencoder. arXiv , 2018. URL arXiv:1712.09043 .S. Li, J. Kawale, and Y. Fu. Deep collaborative filtering via marginalized denoising auto-encoder.
CIKM , 2015.10enerating Artificial Core Users for Interpretable Condensed Data
A P
REPRINT
Zhang Li, Yu Lei, and Cao Shuyan. Constructing the core user set for collaborative recommendation based on samplesselection idea.
International Journal of u- and e- Service, Science and Technology , 9:27–34, 2016.V. A. Marchenko and L. Pastur. Distribution of eigenvalues for some sets of random matrices.
Mathematics of theUSSR-Sbornik , 1(4):457–483, 1967.Michael P OMahony, Neil J Hurley, and Guenole CM Silvestre. Promoting recommendations: An attack on collaborativefiltering.
International Conference on Database and Expert Systems Applications , pages 494–503, 2002.Al Mamunur Rashid, Istvan Albert, Dan Cosley, Shyong K. Lam, Sean M. McNee, Joseph A. Konstan, and John Riedl.Getting to know you: Learning new user preferences in recommender systems.
IUI , 2002.Al Mamunur Rashid, George Karypis, and John Riedl. Learning preferences of new users in recommender systems: Aninformation theoretic approach.
KDD , 10:90–100, 2008.Badrul M. Sarwar, Joseph A. Konstan, Al Borchers, Jon Herlocker, Brad Miller, and John Riedl. Using filteringagents to improve prediction quality in the grouplens research collaborative filtering system.
Computer SupportedCooperative Work , pages 345–354, 1998.Badrul Sarwar†, George Karypis, Joseph Konstan†, and John Riedl. Incremental singular value decompositionalgorithms for highly scalable recommender systems.
ICCIT , 2002.Robert E. Schapire. The strength of weak learnability.
Machine Learning , 5:197–227, 1990.D. Sculley. Web-scale k-means clustering.
WWW , pages 1177–1178, 2010.S. Sedhain, A. K. Menon, S. Sanner, and L. Xie. Autorec: Autoencoders meet collaborative filtering.
WWW , 2015.Yanir Seroussi, Fabian Bohnert, and Ingrid Zukerman. Personalised rating prediction for new users using latent factormodels.
ACM conference on Hypertext and hypermedia , pages 47–56, 2011.Y. Shi, M. Larson, and A. Hanjalic. Collaborative filtering beyond the user-item matrix: A survey of the state of the artand future challenges.
CSUR , 2014.F. Strub, J. Mary, and R. Gaudel. Hybrid collaborative filtering with autoencoders. arXiv , 2016. URL arXiv:1603.00806 .H. Wang, N. Wang, and D.-Y. Yeung. Collaborative deep learning for recommender systems.
KDD , 2015.H. Wang, S. Xingjian, and D.-Y. Yeung. Collaborative recurrent autoencoder: Recommend while learning to fill in theblanks.
NIPS , 2016.Y. Wu, C. DuBois, A. X. Zheng, and M. Ester. Collaborative denoising auto-encoders for top-n recommender systems.
WSDM , 2016.Wei Zeng, An Zeng, Hao Liu, Ming-Sheng Shang, and Tao Zhou. Uncovering the information core in recommendersystems.
Scientific Reports , pages 6140–6152, 2014.F. Zhang, N. J. Yuan, D. Lian, X. Xie, and W.-Y. Ma. Collaborative knowledge base embedding for recommendersystems.
KDD , 2016.S. Zhang, L. Yao, and A. Sun. Deep learning based recommender system: A survey and new perspectives. arXiv , 2017.URL arXiv:1707.07435 .Xiao Zhang, Simon S. Du, and Quanquan Gu. Fast and sample efficient inductive matrix completion via multi-phaseprocrustes flow.