Destination similarity based on implicit user interest
DDestination similarity based on implicit user interest
Hongliu CAO ∗ , Eoin Thomas ∗∗ Amadeus S.A.S., AI Research Department, 485 Route du Pin Montard. BP 69, 06902 Sophia Antipolis Cedex, FranceE-mail: [email protected]
Abstract —With the digitization of travel industry, it is moreand more important to understand users from their onlinebehaviors. However, online travel industry data are more chal-lenging to analyze due to extra sparseness, dispersed user historyactions, fast change of user interest and lack of direct or indirectfeedbacks. In this work, a new similarity method is proposedto measure the destination similarity in terms of implicit userinterest. By comparing the proposed method to several otherwidely used similarity measures in recommender systems, theproposed method achieves a significant improvement on traveldata.
Index Terms —Destination similarity, Travel industry, Recom-mender System, Implicit user interest
I. I
NTRODUCTION
The travel industry is more and more relying on e-commercenowadays as online solutions have made life more convenientand comfortable for everyone. However, unlike the onlineshopping industry, the travel industry is more complicated toanalyse in four ways: 1. user data are much more sparse. Auser on amazon may have a lot of search and purchasinghistory on Amazon during a short period like one month.But in the travel industry, a passenger may only reserve fighttickets once a year. 2. For platforms such as Amazon, eachgood/item has a clear hierarchical category (e.g, diapers belongto the category of Baby care, which belongs to a higher levelcategory of Baby), which may be used as the definition of userinterest. In this application for the travel industry, we considerthe destination as the item such as Paris, London or Shanghai.But it is hard to define a category for Paris in terms of explicituser interest. 3. Account and user information is necessary foronline purchasing on Amazon, which makes it easier to groupthe history of searches and purchases by user. For online flightbookings for example, the user does not have to create anaccount during the booking phase. Travelers often book flighttickets, train tickets, hotel, activities on different platforms,which makes it harder to group the purchases by user. 4.Most online products such as movies or Amazon products haveratings as user can give feedback easily. However, given that atraveler has been to Paris for example, it is hard to get a ratingfrom the traveler on the Paris trip. Because a trip includestickets booking, hotel booking, Point Of Interests (POIs) visits,food, etc. On the other hand, it is difficult for travelers to ratea trip to a certain destination as there are many variables. Allthese differences make it a great challenge to collect, analyseand understand user data in the travel industry. One important way to help understand traveler trends isdestination similarity. Destination similarity is very importantfor the travel industry:1) For travelers: we want to help travelers to find similardestinations that they may be interested in. For example,with the search or booking history of a traveler, similardestinations can be recommended to the traveler for thenext trip. Another example is that the unprecedentedCOVID-19 crisis has introduced a number of travelrestrictions which will prevent leisure travelers fromreaching to their dream destination. With the destinationsimilarity, we can recommend them alternative non-restricted destinations they might also be interested in.2) For tourism offices: tourism offices can better identifytheir competitors for each origin market. This can allowthem to better distinguish themselves and target travelersconsidering trips to similar locations.3) For online advertising companies, destination similaritycan be used to identify if the current user who issearching for destination A would be interested in theirimpression of destination B, to improve the click throughrate or conversion rate.4) For a sustainable travel industry, the destination simi-larity can be used to suggest destinations that travelersmight be willing to visit (so with the potential toconvert a search into a booking), but closer to theirhome or simply better served by direct flights, thusreducing the CO2 emissions linked to transport. It canalso be a solution to fight over-tourism problems byrecommending similar destinations with fewer touristsor offering a more local an authentic experience, makingtravel even more rewarding.In this work, we propose to measure destination similarityfrom the search logs based on anonymized cookie ID. Varioussimilarity measures (among users or items) have been pro-posed in the literature for collaborative filtering (CF) basedrecommender systems. However, most of these measures areproposed based on the users’ ratings, while there’s no ratinginformation for a destination (city) in the travel industry. Thismakes many similarity measures not suitable for our problem.To fill this gap in the literature, we investigate different pos-sible solutions and propose a new similarity measure, whichhas superior performance than the state of the art methods.The remainder of the paper is organized as follows: thebackground and related works are introduced in Section II. In a r X i v : . [ c s . I R ] F e b ection III, the proposed similarity measures are introduced.We describe the data sets chosen in this study and provide theprotocol of our experiments in Section IV and the experimentalresults in Section V. The final conclusion and future works aregiven in Section VI.II. B ACKGROUND AND RELATED WORKS
Generally speaking, before a traveler makes a bookingfor a holiday, there are two preliminary steps: inspirationand search. There are a lot of information sources that caninspire travelers such as a movie scene or recommendationsfrom relatives. During the inspiration step, the user interestis broad and general, thus we can only estimate the implicituser interest. With enough accumulated motivation, travelerswill then search for more detailed information from travelwebsites, blogs or reviews to enrich their general picture ofthe destination. Then, the traveler will start to search flightand hotel information. If the prices agree with the budget,the traveler will pass to the next step: booking. Otherwise,the traveler may search for another similar destination andcompare the price.The general motivation here is that: when a user searchesfor travel to a destination, this action shows that the user isinterested in this destination. But we don’t know what exactlythe user is interested in (e.g. the museum, beach, mountains orother activities), which can be called ‘implicit user interest’.The explicit user interest is difficult and expensive to get.Many companies ask the customers directly, while others tryto infer from customer shopping/spending behavior. However,apart from the costs, the explicit user interest may not beclear for travelers themselves either. Because tourist attractionsor POIs are not the only reason travelers get interested ina destination, it may also due to the culture (local people,food, etc.), weather, events and so on. Capturing implicituser interest seems easier and more direct. When a usersearches both destination A and destination B, there mustbe some similarities between these two destinations for thisuser. However, the user interest changes overtime. If the timedifference between two different searches is 10 months forexample, these two destinations may not be similar as theuser interest may shift due to the season or travel context.Hence, limiting the time period between two different searchesis important.The research question is: given that a user has searchedfor several destinations, how can we determine the destinationsimilarity? In the literature, there are many item similaritymeasures that can be applied. For example, in recommendersystems, the CF has become the most widely used methodto recommend items for users [1], [2]. The core of CF is tocalculate similarities among users or items [3]. Destinationsimilarity can be seen as recommending destinations for usersfrom the point of view of CF.The classic CF problem has a rating matrix. Let R =[ r u,i ] m × n be a rating matrix with m users and n items. Eachentry r ui is a rating value given by user u to item i . Thereare different ranges of r ui in real world datasets. Among which, the range 1,2,3,4,5 is adopted in many datasets suchas movie reviews and restaurant reviews. Many predefinedsimilarity measures can be used for CF. The most commonlyused similarity measure are the Pearson correlation coefficient(PCC) [4] and the Cosine similarity (Cos) [5]: P CC ( u, v ) = Σ i ∈ I ( r u,i − r u )( r v,i − r v ) (cid:112) Σ i ∈ I ( r u,i − r u ) (cid:112) Σ i ∈ I ( r v,i − r v ) (1) Cos ( u, v ) = Σ i ∈ I r u,i r v,i (cid:113) Σ i ∈ I r u,i Σ i ∈ I r v,i (2)where I represents the set of common rating items by users u and v . r u and r v are the average rating value of user u and v respectively.Many variants of PCC and Cos have also been proposed. In[6], Constrained Pearson Correlation Coefficient (CPCC) hasbeen proposed to take the negative rating values into account.Other studies suggest that the number of co-rated items canimpact the performance of the similarity measure. Hence, theWeighted Pearson Correlation Coefficient (WPCC) [7] andthe Sigmoid Function based Pearson Correlation Coefficient(SPCC) [8] have also been proposed.Another widely used similarity measure is the Jaccard.Jaccard similarity coefficient is one of the popular methodsfor calculating the similarity between users/items [9], [10]: Jaccard ( u, v ) = | I u ∩ I v || I u ∪ I v | (3)where I u and I v are the sets of items rated by users u and vrespectively. Unlike the previous two similarity measures, theJaccard similarity considers only the number of co-rated itemsbetween two users regardless the rating values, which seemsto be suitable for our problem.Apart from previously introduced widely used similaritymeasures, there are some recent advances in the literature too.In [2], the authors list the limitations of popular similaritymeasures used in CF and propose a new similarity measureby combining the percentage of non common ratings with theabsolute difference of ratings. However, all these similaritymeasures are proposed for measuring user similarities withitem ratings [10], [11]. However, this is not adapted to recom-mend destinations to travelers for two reasons. Firstly, there areno ratings for destinations from travelers. From search logs,we can only get binary response that if a traveler searchedthis destination or not. Secondly, due to the user interestchange, only recent searches should be used to recommenddestinations to travelers, which means that there are very fewsearched cities. It is difficult to measure the user similarity withlittle information. Hence, we need to measure the destinationsimilarity instead of user similarity. Here, we propose a newsimilarity measure for items without any user ratings in thenext section, and apply it to destination similarity in ourexperiments.II. P ROPOSED SIMILARITY MEASURES
To recommend a destination to a traveler who has madeone or more searches recently, we can directly measure thedestination similarity and recommend destinations similar totravelers’ recent searches.Let R = [ r u,i ] m × n be a binary matrix of with m usersand n destinations. r u,i = 1 means user u recently searcheddestination i , while r u,i = 0 means user u didn’t searchdestination i . The matrix R is very sparse due to two reasons:1. there are many destinations while each traveler only knowsfew of them; 2. people don’t plan travel frequently. As thematrix is binary, many commonly used CF similarity measuresbased on ratings are less meaningful.In this work, a simple and easy to understand similaritymeasure is proposed inspired from Random Forest Similarity(RFS) [12], [13]. Random Forest (RF) classifier [14] is oneof the most successful and widely used classifiers. A RF H isan ensemble made up with M decision trees , denoted as inEquation (4): H ( X ) = { h k ( X ) , k = 1 , . . . , M } (4). RFS is a similarity measure inferred from RF, which is alsowidely used [15]–[17]. For each tree h k in the forest H , if twodifferent instances X i and X j fall in the same terminal node,they are considered as similar: RF S ( k ) ( X i , X j ) = (cid:40) , if l k ( X i ) = l k ( X j )0 , otherwise (5)where l k ( X ) is a function that returns the leaf node of tree k given input X . The final RFS measure RF S ( H ) consists incalculating the RF S ( k ) value for each tree h k in the forest,and to average the resulting similarity values over the M treesas in Equation (6): RF S ( H ) ( X i , X j ) = 1 M M (cid:88) k =1 RF S ( k ) ( X i , X j ) (6) Cluster Consensus Similarity (
CCS ) : RFS is mostlydesigned for classification problems, which is not suitable forthe destination similarity problem with only user-destinationinteraction matrix. However, inspired by the idea of RFS, wepropose a simple method named CCS . For RFS, each tree istrained to give a different data partition: each leaf node groupsone or several instances together. In this work, the destinationssearched by each user can be seen as a cluster. The destinationsin this cluster share some similarity in terms of this user’sinterest. With this intuition, the similarity between destination i and j for user u can be defined as: CCS ( u ) ( i, j ) = (cid:40) , if r u,i = r u,j = 10 , otherwise (7)Similar to RFS, the final similarity between destination i and j is then averaged over all users: CCS ( U ) ( i, j ) = 1 m m (cid:88) u =1 CCS ( u ) ( i, j ) (8) With the proposed CCS measure, a n × n destinationsimilarity matrix can be provided ((9)). S CCS = CCS ( U ) (1 , CCS ( U ) (1 , . . . CCS ( U ) (1 , n ) CCS ( U ) (2 , CCS ( U ) (2 , . . . CCS ( U ) (2 , n ) ... ... ... ... CCS ( U ) ( n, CCS ( U ) ( n, . . . CCS ( U ) ( n, n ) (9)In the similarity matrix, each row is a similarity vector,represents its similarity to all other destinations. To avoidrecommend the searched destination, the diagonal values (thesimilarity value to itself) are set to 0. Normed Cluster Consensus Similarity (
CCS norm ) : Theproposed CCS method is simple and easy to understand. Thedestination similarity for each user is calculated at first toreflect this user’s travel interest. Then, the similarity valuesof all users are averaged, which can be seen as a majorityvoting process.However, there are two requirements for a similarity mea-sure. The first one is that each similarity vector can providea good ranking, so that it can answer which are the mostsimilar destinations given a known destination. The secondone is to provide a solution when given multiple inputs.This requires that different similarity vectors (e.g. S CCSi, and S CCSj, ) should be comparable so that simple operations suchas summing make sense. CCS can meet the first requirement,but not the second one. Because less popular destinationshave very small value, especially when the user number m is very large, while popular destinations have larger value.This means that if we average two similarity vectors to finddestinations similar to both given searched cities, the popularone dominates the averaged results. This means that we onlyfocus on the destinations to the popular one and ignore the lesspopular one. To mitigate this effect, the CCS norm is proposedto re-scale each similarity vector: S CCS norm i, = S CCSi, max ( S CCSi, ) (10) Popularity based Cluster Consensus Similarity (
P CCS ) :The proposed CCS norm method re-scales the similarity vec-tors for all destinations to the same range [0,1] so that theyare comparable between destinations. CCS norm can help toavoid the situation that popular destination dominates the finalsimilarity values when merging with less popular destinations.However, this brings another problem. Popular destinationsare searched by most people, which means there are moredata and we have more confidence on the similarity vectorfor popular destinations. For example, if unpopular destination i has been searched by 2 users among 1 million users andanother unpopular destination j has been searched by another 2users, the similarity between i and j is 0. However, the fact canbe that they are quite similar, we just don’t have enough datato support their similarity. Hence, we have more confidenceor the similarity vector of popular destinations. With thisintuition, the P CCS is proposed based on the
CCS norm : S P CCSi, = 11 + e p i − S CCSnormi, (11)where p i is the popularity of destination i , defined on b i ,the rank of destination i based on the number of searches: p i = 1 − w × b i m (12)Here, w ∈ (0 , is a parameter to control the differencebetween the similarity values for popular and unpopulardestinations. This allows a trade-off between putting moreconfidence on popular destinations ( w > ) and putting sameconfidence for all destinations ( w = 0 ). When w is bigger,the popular destination vectors are more weighted. However,too big w may lead to over focus on popular destinations anddecrease the recommendation diversity.IV. E XPERIMENTS
A. Description of datasets
In this work, we use the destination search data. In thisdataset, we use anonymized user cookie id to group thesearches of the same user together. Users from 5 countriesare selected due to business interest. The data contain activityrelated to search sessions, from which only 3 columns arepreserved: the anonymized user cookie id, the searched des-tination location and the country from which the search wasmade. This last field is only used to create multiple marketspecific datasets to preserve cultural differences in destinationpreferences.
B. Protocol of experiments
The main objective of our experiments is to find the mostsuited similarity measure for search logs (binary user-itemsinteraction). Apart from comparing different measures, thecultural difference and user interest change should also betaken into consideration.
Culture difference
The travel preferences of a Frenchperson, for example, may be different from those of a Chineseperson. To deal with this challenge, we took country/cultureinto account when measuring destination similarity. The des-tination similarity is measured by the country from which thesearch was made.
User interest change
For most countries, user interest canchanges over time. For example, summer destinations areusually different from winter destinations. To cope with thischallenge, we regularly update the destination similarity toadapt to this shift in user interest. In the previous solution,recent two months data are used to calculate the similaritymatrix and the results are updated weekly.
Training
For the training procedure, we have data from 5countries. For each country, 10 time periods are selected across2019 and 2020. The three proposed methods
CCS , CCS norm and
P CCS are compared to 4 widely used similarity measuresin the literature: Cosine similarity, Pearson similarity, Jaccardsimilarity and Kulsinski similarity. Like Jaccard similarity, Kulsinski similarity is also a measure for binary vectors. Itis less popular, but tested in many different fields [18]–[20]and achieving some good results [21]. For
P CCS method,different w values are tested and the best w is selected. Testing
The similarity matrix is calculated from the 8 weeksof data, and tested on the data of the following week. Duringthe test phase, we randomly mask one searched destinationfor each user, and use the rest of the searched destinations topredict the masked destination. To realize this, the average ofthe rest searched destinations’ similarity vectors is calculated.Then, we check if the masked destination is in the list of top5 most similar destinations.V. R
ESULTS
A. Comparison of different similarity measures
To compare seven similarity measures, the experiments on 5countries data with 10 time periods for each country have beendone. The means and standard deviations of the top 5 accuracy(relative) over 10 periods for each method are shown in TableI. The results illustrate that among all five countries,
P CCS always has the best performance, followed by
CCS norm . P earson is always the worst performing method among allsimilarity measures. Table I gives more details on comparison.On average, the proposed
P CCS improves the performanceof
P earson by 7.01%. Compared to the previous selectedmethod
Cos , P CCS gains an improvement of 4.44% onaverage.From the results of average ranking in Table I, it can beobserved that two best performing methods are
P CCS and
CCS norm , while the widely used similarity measures
Cos and
P earson are the worst. One reason is that these two measuresare not designed for the comparison for binary vectors.The Wilcoxon-Holm post hoc test with Critical Differences(CD) is also done to have an overall statistical comparison.The statistical test result is shown in Figure 1. Generallyspeaking, all three
CCS based methods are significantly betterthan
P earson and
Cos . The proposed
P CCS is significantlybetter than all other 6 similarity measures.
Fig. 1. The CD diagram according to the Wilcoxon-Holm post hoc testresult when alpha is 0.00001. Methods connected with bold black line haveno significant difference.
B. Comparison of different training size
In the previous section, all the experiments use the previouseight weeks of data as training data for the following weekof test data. However, using eight weeks of training data foreach market and updating the results weekly can be very time
ABLE IT
HE EXPERIMENTAL RESULTS ON FIVE COUNTRIES DATA , WITH TEN TIME PERIODS FOR EACH COUNTRY . T
HE MEAN AND STANDARD DEVIATION OF TOP ACCURACY ( RELATIVE ) IS SHOWN . T
HE BASELINE IS P EARSON METHOD , AND THE NUMBERS IN THIS TABLE SHOW THE IMPROVEMENT AGAINST P EARSON METHOD . Pearson Cos Jaccard Kulsinski CCS
CCS norm
PCCSCountry1 baseline . ± .
80 2 . ± .
13 9 . ± .
06 9 . ± .
84 9 . ± . . % ± . Country2 baseline . ± .
21 2 . ± .
57 4 . ± .
86 4 . ± .
93 5 . ± . . % ± . Country3 baseline . ± .
28 2 . ± .
75 4 . ± .
39 4 . ± .
36 4 . ± . . % ± . Country4 baseline . ± .
42 2 . ± .
53 5 . ± .
51 5 . ± .
48 5 . ± . . % ± . Country5 baseline . ± .
80 3 . ± .
77 5 . ± .
48 5 . ± .
45 6 . ± . . % ± . Avg rank 7.00 5.60 5.40 3.20 3.60 2.20 1.00Avg improvement 0.00 2.56% 2.88% 5.89% 5.74% 6.31% 7.01%
TABLE IIT
HE EXPERIMENTAL RESULTS WITH FOUR WEEKS OF TRAINING DATA INSTEAD OF EIGHT WEEKS . T
HE MEAN AND STANDARD DEVIATION OF TOP ACCURACY ( RELATIVE ) IS SHOWN . T
HE BASELINE IS P EARSON METHOD , AND THE NUMBERS IN THIS TABLE SHOW THE IMPROVEMENT AGAINST P EARSON METHOD . Pearson Cos Jaccard Kulsinski CCS
CCS norm
PCCSCountry1 baseline . ± .
22 2 . ± .
50 8 . ± .
51 8 . ± .
43 8 . ± . . % ± . Country2 baseline . ± .
70 2 . ± .
90 4 . ± .
41 4 . ± .
38 4 . ± . . % ± . Country3 baseline . ± .
29 2 . ± .
80 5 . ± .
41 4 . ± .
40 5 . ± . . % ± . Country4 baseline . ± .
36 2 . ± .
59 5 . ± .
73 5 . ± .
74 6 . ± . . % ± . Country5 baseline . ± .
94 3 . ± .
93 5 . ± .
64 5 . ± .
66 6 . ± . . % ± . Avg rank 7.00 6.00 5.00 2.80 4.00 2.20 1.00Avg improvement 0.00 2.39% 2.88% 5.97% 5.87% 6.36% 7.07% consuming and computationally expensive. In this section, wetry to reduce the training data to four weeks only to see towhich extend the prediction performance is affected.The experimental results are presented in Table II. Similar tothe analysis in the previous section using eight week trainingdata, the results trained on only four weeks data also show thatthe proposed
P CCS is the best performing method while
Cos and
P earson are the worst. On average,
P CCS increasesthe performance of
P earson by 7.06% and increases theperformance of
Cos by 4.67%, which is also similar to theconclusion in the previous section.However, compared to the results in Table I, the averageperformance of each similarity measure is not strongly im-pacted by the reduction of training data size. The method thathas the biggest difference is
Cos , with a reduction of 0.65%of accuracy.
P CCS has a performance reduction of 0.42% onaverage. But when we look into each country, it can be foundthat the differences on Country2, Country3, Country4 andCountry5 are negligible, but the performance on Country1 hasa drop around 1.76% (this analysis is based on the comparisonof the absolute performances, which is not disclosed). Onepossible reason can be that, the data coverage in Country1 isnot very good and the data size is much smaller than othercountries, which can also explain this big performance drop on Country1 than other countries.The objective of this experiment is to answer the researchquestion: how many data are enough to have a good predictionperformance but at the same time keep the computationalefficiency. The experimental results show that for countrieswith good data coverage, four weeks training data are enoughcompared to eight weeks training data. However, for countrieswithout a proper data coverage, it’s better to use eight weekstraining data.
C. Comparison of data from 2019 and 2020
Due to the crisis of covid-19, the data volume of 2020 ismuch smaller than the volume of 2019 data. The questionin this section is: should we use data from 2019 or 2020 astraining data for the prediction of 2020?To answer this question, we choose the first week of June2020 as the test data and use the data from April and May2019 and 2020 respectively as training data to compare theirprediction performance. We choose this time period is becausethere were confinement in Europe during April and May 2020,and the data volume is lower compared to the same period in2019.The experimental results are shown in Figure 2. Globallyspeaking, 2020 data have better perdition performance than019 data even though the 2020 data volume is much smaller.The smallest difference happens in Country3 with 1.05% ac-curacy gap, while the biggest difference happens in Country4with 4.75% performance gap. One possible reason may bethe change of user interest. Another possible reason may bethat the covid-19 crisis has changed user’s behavior (e.g. morelocal travel than international travel).
Fig. 2. The comparison of the prediction performance between 2019 dataand 2020 data. X-axis is the country, Y-axis is the accuracy.
VI. C
ONCLUSION
In this work, we have presented the challenges of recom-mending destinations in the travel industry and the differencesof recommending items other e-commerce sectors. The chal-lenges of extra sparseness, dispersed user history, change ofuser interest, few direct or indirect feedbacks make it muchharder to understand travelers and make recommendationsmore difficult than for other online consumers.To tackle these challenges and to understand travelers,we decide to measure the destination similarity in termsof traveler’s implicit user interest. There are many similar-ity measures proposed in the field of collaborative filtering.However, most of these measures are designed for user-iteminteraction with ratings. Hence we propose a new similaritymeasure for user-item interaction without ratings to deal withthe challenges in travel industry. The proposed
P CCS isinspired from Random Forest Similarity to take user interestinto account. The destination popularity is added to adjustthe magnitude of each similarity vector so that the similarityvectors of different destinations can be fused correctly. Aftercomparing seven different similarity measures on real worlddata, the proposed
P CCS is proved to be the best solution.However, there are some improvements can be done. Firstly,we can expand single-source to multi-sources. Users can belimited by the knowledge of destinations, which means thatthere are destinations users never search because they don’tknow them not because they are not interested in them.Search logs can only reflect the similarity among destinationsknown to users. In this way, apart from the implicit userinterest from search logs, other sources can be added suchas destination images or descriptions. By using multi-sourcedata, the implicit user interest can be fused with the destination information to provide a more meaningful similarity measure.Secondly, more user information and session information canbe collected to provide a more personalized similarity measure.But more input information may also limit its use in real worlduse cases. R
EFERENCES[1] F. Ricci, L. Rokach, and B. Shapira, “Introduction to recommendersystems handbook,” in
Recommender systems handbook , pp. 1–35,Springer, 2011.[2] A. Gazdar and L. Hidri, “A new similarity measure for collabora-tive filtering based recommender systems,”
Knowledge-Based Systems ,vol. 188, p. 105058, 2020.[3] H. Liu, Z. Hu, A. Mian, H. Tian, and X. Zhu, “A new user similaritymodel to improve the accuracy of collaborative filtering,”
Knowledge-Based Systems , vol. 56, pp. 156–166, 2014.[4] X. Su and T. M. Khoshgoftaar, “A survey of collaborative filteringtechniques,”
Advances in artificial intelligence , vol. 2009, 2009.[5] G. Adomavicius and A. Tuzhilin, “Toward the next generation ofrecommender systems: A survey of the state-of-the-art and possibleextensions,”
IEEE transactions on knowledge and data engineering ,vol. 17, no. 6, pp. 734–749, 2005.[6] U. Shardanand and P. Maes, “Social information filtering: algorithms forautomating “word of mouth”,” in
Proceedings of the SIGCHI conferenceon Human factors in computing systems , pp. 210–217, 1995.[7] J. L. Herlocker, J. A. Konstan, A. Borchers, and J. Riedl, “An algorithmicframework for performing collaborative filtering,” in
ACM SIGIR Forum ,vol. 51, pp. 227–234, ACM New York, NY, USA, 2017.[8] M. Jamali and M. Ester, “Trustwalker: a random walk model forcombining trust-based and item-based recommendation,” in
Proceedingsof the 15th ACM SIGKDD international conference on Knowledgediscovery and data mining , pp. 397–406, 2009.[9] T. Arsan, E. K¨oksal, and Z. Bozkus, “Comparison of collaborativefiltering algorithms with various similarity measures for movie recom-mendation,”
International Journal of Computer Science, Engineeringand Applications (IJCSEA) , vol. 6, no. 3, pp. 1–20, 2016.[10] G. Jain, T. Mahara, and K. N. Tripathi, “A survey of similarity measuresfor collaborative filtering-based recommender system,” in
Soft Comput-ing: Theories and Applications , pp. 343–352, Springer, 2020.[11] L. Al Hassanieh, C. Abou Jaoudeh, J. B. Abdo, and J. Demerjian, “Sim-ilarity measures for collaborative filtering recommender systems,” in , pp. 1–5, IEEE, 2018.[12] H. Cao, S. Bernard, R. Sabourin, and L. Heutte, “Random forest dis-similarity based multi-view learning for radiomics application,”
PatternRecognition , vol. 88, pp. 185–197, 2019.[13] H. Cao,
Random forest for dissimilarity based multi-view learning:application to radiomics . PhD thesis, Normandie Universit´e; Universit´edu Qu´ebec. ´Ecole de technologie sup´erieure, 2019.[14] L. Breiman, “Random forests,”
Machine learning , vol. 45, no. 1, pp. 5–32, 2001.[15] T. Shi and S. Horvath, “Unsupervised learning with random forestpredictors,”
Journal of Computational and Graphical Statistics , vol. 15,no. 1, pp. 118–138, 2006.[16] H. Cao, S. Bernard, L. Heutte, and R. Sabourin, “Improve the perfor-mance of transfer learning without fine-tuning using dissimilarity-basedmulti-view learning for breast cancer histology images,” in
Internationalconference image analysis and recognition , pp. 779–787, Springer, 2018.[17] Z. FARHADI and D. SHAHSAVANI, “Gene expression data clusteringwith random forest dissimilarity,” 2015.[18] R. A. Lewis, “Data patterns discovery using unsupervised learning,”2019.[19] A. Smailagic, H. Y. Noh, P. Costa, D. Walawalkar, K. Khandelwal,M. Mirshekari, J. Fagert, A. Galdr´an, and S. Xu, “Medal: Deep activelearning sampling method for medical image analysis,” arXiv preprintarXiv:1809.09287 , 2018.[20] R. Levine, C. Lin, and Z. Wang, “Acquiring banking networks,” tech.rep., National Bureau of Economic Research, 2017.[21] V. Vinayan, K. Soman, et al. , “Amritanlp at semeval-2018 task 10: Cap-turing discriminative attributes using convolution neural network overglobal vector representation.,” in