[PDF] Listener Modeling and Context-aware Music Recommendation Based on Country Archetypes

Abstract

Music preferences are strongly shaped by the cultural and socio-economic background of the listener, which is reflected, to a considerable extent, in country-specific music listening profiles. Previous work has already identified several country-specific differences in the popularity distribution of music artists listened to. In particular, what constitutes the "music mainstream" strongly varies between countries. To complement and extend these results, the article at hand delivers the following major contributions: First, using state-of-the-art unsupervised learning techniques, we identify and thoroughly investigate (1) country profiles of music preferences on the fine-grained level of music tracks (in contrast to earlier work that relied on music preferences on the artist level) and (2) country archetypes that subsume countries sharing similar patterns of listening preferences. Second, we formulate four user models that leverage the user's country information on music preferences. Among others, we propose a user modeling approach to describe a music listener as a vector of similarities over the identified country clusters or archetypes. Third, we propose a context-aware music recommendation system that leverages implicit user feedback, where context is defined via the four user models. More precisely, it is a multi-layer generative model based on a variational autoencoder, in which contextual features can influence recommendations through a gating mechanism. Fourth, we thoroughly evaluate the proposed recommendation system and user models on a real-world corpus of more than one billion listening records of users around the world (out of which we use 369 million in our experiments) and show its merits vis-a-vis state-of-the-art algorithms that do not exploit this type of context information.

Full PDF

[[Preprint version] Listener Modeling and Context-aware MusicRecommendation Based on Country Archetypes

MARKUS SCHEDL ∗ , Johannes Kepler University Linz, Institute of Computational Perception, Multimedia Miningand Search Group Linz Institute of Technology, AI Lab, Human-centered AI Group, Austria

CHRISTINE BAUER,

Utrecht University, The Netherlands

WOLFGANG REISINGER,

Johannes Kepler University Linz, Institute of Computational Perception, MultimediaMining and Search Group, Austria

DOMINIK KOWALD,

Graz University of Technology and Know-Center, Austria

ELISABETH LEX,

Graz University of Technology and Know-Center, Austria

Music preferences are strongly shaped by the cultural and socio-economic background of the listener, which is reflected, to a considerableextent, in country-specific music listening profiles. Previous work has already identified several country-specific differences in thepopularity distribution of music artists listened to. In particular, what constitutes the “music mainstream” strongly varies betweencountries. To complement and extend these results, the article at hand delivers the following major contributions: First, using state-of-the-art unsupervised learning techniques, we identify and thoroughly investigate (1) country profiles of music preferences on thefine-grained level of music tracks (in contrast to earlier work that relied on music preferences on the artist level) and (2) countryarchetypes that subsume countries sharing similar patterns of listening preferences. Second, we formulate four user models thatleverage the user’s country information on music preferences. Among others, we propose a user modeling approach to describe amusic listener as a vector of similarities over the identified country clusters or archetypes. Third, we propose a context-aware musicrecommendation system that leverages implicit user feedback, where context is defined via the four user models. More precisely, itis a multi-layer generative model based on a variational autoencoder, in which contextual features can influence recommendationsthrough a gating mechanism. Fourth, we thoroughly evaluate the proposed recommendation system and user models on a real-worldcorpus of more than one billion listening records of users around the world (out of which we use 369 million in our experiments) andshow its merits vis-à-vis state-of-the-art algorithms that do not exploit this type of context information. music, recommender system, culture, country, clustering, context, user modeling, music preferences.

Recommendation systems (or recommender systems) have become an important means to help users find and discovervarious types of content and goods, including movies, videos, books, and food [56]. As such, they represent substantialbusiness value. In the music industry, recommender systems—powered by machine learning and artificial intelligence—have radically changed the market; they have even become major drivers in this industry. Essentially, music recommendersystems (MRS) shape today’s digital music distribution [66] and have become vital tools for marketing music to a ∗ Corresponding author: Markus Schedl, [email protected]’ addresses: Markus Schedl, [email protected], Johannes Kepler University Linz, Institute of Computational Perception, Multimedia Miningand Search Group, Linz Institute of Technology, AI Lab, Human-centered AI Group, Linz, Austria; Christine Bauer, [email protected], Utrecht University,Utrecht, The Netherlands; Wolfgang Reisinger, Johannes Kepler University Linz, Institute of Computational Perception, Multimedia Mining and SearchGroup, Linz, Austria; Dominik Kowald, Graz University of Technology and Know-Center, Graz, Austria; Elisabeth Lex, Graz University of Technologyand Know-Center, Graz, Austria. a r X i v : . [ c s . I R ] S e p [Preprint version as of September 22, 2020]targeted audience, as evidenced by the success of recommender-systems-featuring music streaming services such asSpotify, Deezer, or Apple Music. While MRS operate in a multi-stakeholder environment including platform providers,artists, record companies, and music consumers/listeners [10], it is most commonly the music consumers/listeners, whoare considered the users of an MRS. In the paper at hand, we also take this perspective.Traditionally, content-based filtering (CBF) and collaborative filtering (CF)—or hybrid combinations thereof—havebeen the most common algorithms to create recommender systems [56]. The former assumes that users will likeitems similar to the ones they liked in the past, and therefore selects items to recommend according to some notionor metric of similarity in terms of item content (e.g., music style, timbre, or rhythm) between the user’s liked itemsand unseen items from the catalog. In contrast, CF assumes that a user will prefer items that are liked by other userswith similar preferences. In this case, items to recommend are, for instance, found by comparing the target user’sconsumption or rating profile to that of the other users, identifying the most similar other users, and recommendingwhat they liked (user-based CF). Alternatively, users and items can be directly matched via similarities computed in ajoint low-dimensional representation of users and items (i.e., model-based CF).Enhancing the classical approaches CF and CBF, in recent years, researchers started to leverage additional information—beyond users, items, and their interactions—to improve recommendations. Recommender systems that consider usercharacteristics or information describing a situation are typically referred to as context-aware recommendation sys-tems [2]. Next to considering time and location as contextual side information, taking information derived from theuser’s country into account has been demonstrated to improve recommendation quality; for instance, cultural andsocio-economic characteristics of the user’s country [83], or the user taste’s proximity to their country-specific musicmainstream (“mainstreaminess”) [9].Against this background, we approach the task of context-aware music recommendation based on country infor-mation; in contrast to most previous works, we consider user country in our approach without using any externalinformation about the country, such as cultural, economic, or societal information. The reason is that respective datasources about countries (e.g., Hofstede’s cultural dimensions, the Quality of Government measures, or the WorldHappiness Report ) provide information on the country level, which may not necessarily reflect the circumstances ofindividual users and, thus, can introduce problems in the recommendation process. For instance, cultural values orincome may be very unequally distributed among a country’s population.To avoid this, instead of using external information derived from the user’s country, we leverage purely the self-reported country information of the users as available in the system, and investigate how behavioral data about musiclistening can be used to (1) identify archetypal country clusters based on track listening preferences, (2) how users canbe modeled using the results of (1), and (3) how the resulting user models can be integrated into a state-of-the-art deeplearning-based music recommendation algorithm.As in many other domains, nowadays, deep neural network architectures dominate research in music recommen-dation systems, due to their ability to automatically learn features from low-level audio signals and their superiorperformance [62]. This article is no exception. We propose a multi-layer generative model in which contextual featurescan influence recommendations through a gating mechanism.In this context, we formulate the following research questions: https://geerthofstede.com/research-and-vsm/dimension-data-matrix https://qog.pol.gu.se https://worldhappiness.report Preprint version] Listener Modeling and Context-aware Music Recommendation Based on Country Archetypes 3

RQ1:

To what extent can we identify and interpret groups of countries that constitute music preference archetypes ,from behavioral traces of users’ music listening records?

RQ2 : Which are effective ways to model the users’ geographic background as a contextual factor for music recom-mendation?

RQ3 : How can we extend a state-of-the-art recommendation algorithm, based on variational autoencoders, to includeuser context information, in particular, the geo-aware user models developed to answer RQ2?In the remainder of this article, we first explain the conceptual foundation of our work and discuss it in the contextof related research (Section 3). Subsequently, we detail the methods we adopt to investigate the research questions; inparticular, we specify the approaches used for data preparation, clustering, user modeling, and track recommendation(Section 4). The results of our experiments on uncovering geographic music listening archetypes and on music trackrecommendation, altogether with a detailed discussion thereof, are presented in Section 5. Finally, Section 6 concludesthe article with a brief summary of the major findings, a discussion of limitations, and pointers to future work.

A multitude of factors have been found to influence an individual’s music preferences. There is a long history of researchinvestigating the relationships between music preferences and, for instance, demographics [13, 19, 20], personalitytraits [55, 59], and social influences [14, 75] .In the middle of the nineteenth century emerged a cultural hierarchy in America [26, 46] where a high social statuspatronized the fine arts (referred to as “highbrow”) while all other forms of popular culture were associated with alower status (referred to as "middlebrow" or “lowbrow”). In the 1990s, a series of studies [52, 53] have defended theview that, for the elite, highbrow was being replaced by a consumption pattern termed “omnivorousness”. Culturalomnivorousness reflects that people’s taste includes both elite and popular genres. This was subsequently shown to holdfor various countries [e.g. 21, 28, 34]. Also, the consumption practices of low status taste were reconceptualized: Theearlier view that the lowbrow group would be willing to consume any entertainment on offer [35] was replaced by thefinding that low status people tend to choose one form of entertainment and avoid others [16]. Thus, overall the viewevolved from highbrow–lowbrow to omnivore–univore. Analyzing music consumption across eight European countries,Coulangeon and Roharik [23] supported the “omnivore–univore” scheme rather than the former “highbrow–lowbrow”model. The omnivorous cultural taste was later found unstable over time [57], though. Katz-Gerro [41] has shown thatthe dividing line of class distinctions varies across countries and also the genre associations to social classes deviate.She concludes that, while class matters, the main determinants of cultural preferences relate to gender, education, andage [40]. Coulangeon [22] questions the earlier view on the reasons for the different tastes of higher- and lower-statusclasses: He challenges that it would be the upper class’ familiarity with the so-called “legitimate” culture and the littleaccessibility to that culture for the lower-status classes, that distinguished what the upper class and lower-status classesprefer. Instead, he attributes it to the diversity of the stated preferences of people of the upper class, whereas thepreferences of members of lower-status classes appear more exclusive. Later work, studying music taste in the “modernage” [51], found little evidence that musical taste is indeed aligned with class position.Although there is a multitude of factors that influence an individual’s music preferences that lead to a diversity ofmusic created and listened to, there are (market) structures and other mechanisms that effect certain tendencies in whatmusic is preferred within a particular community. For instance, the music recording industry is typically considered aglobally oriented market [27]. Yet, studies have revealed the existence of national boundaries [8]. There are variouscountry-specific mechanisms that affect an individual’s music preferences and consumption behavior: Preferences are [Preprint version as of September 22, 2020]culturally shaped [6, 17]; music perceptions vary across cultures, for instance, with respect to mood [45, 50, 70, 74]; andcountries have substantially different national market structures with respect to, for instance, available music repertoiredue to copyright and licensing, advertising campaigns, local radio airplay, or quotas for national artists [30, 36].Knowledge about country-specific differences in music preferences can be explicitly used to improve music rec-ommender systems, for instance, by leveraging information about the users’ geographic or cultural background. Forinstance, Vigliensoni and Fujinaga [80] use a factorization machine approach for matrix factorization and singularvalue decomposition to integrate—amongst others—a user’s country as context information. Bauer and Schedl [9] use acontextual pre-filtering approach [2], where the user base is first segmented by user country, and a target person isthen compared to other people from the very same country (in contrast to a comparison with the entire user base).Sánchez-Moreno et al. [58] use a k-nearest neighbor (k-NN) approach integrating, amongst others, the user’s countryas attribute. Zangerle et al. [83] leverage further country-specific data sources; for each country, they use the respectivescores on the cultural dimensions by Hofstede [33] as well as the scores of the World Happiness Report [32] to tailorrecommendations to the individual.The work at hand differentiates from related work in several aspects. • First, although music preferences vary across countries, several studies [e.g. 9, 49, 54, 67] have shown similaritiesin music preferences between countries, typically identified with clustering approaches. Yet, to the best of ourknowledge, the work at hand is the first one to integrate information on country similarities into the musicrecommendation approach. • Second, while other work, most notably, Zangerle et al. [83], reaches out to include external data about countries(such as economic factors, happiness index, cultural dimensions), the approach at hand remains independent fromany external data sources, enabling platform providers to build a self-sustaining recommendation system. Such asystem can rely exclusively on data that is contained in the provider’s platform, including users’ self-disclosedcountry information. • Third, most existing research on music preferences and recommender systems considers music preferences on agenre level [e.g. 1, 71] or artist level [e.g. 9, 58]. Research on country-aware music recommendation systemsthat provide recommendations on the track level is rare [e.g. 83]. However, the genre and the artist level may betoo coarse-grained to reflect users’ music preferences, for several reasons. Music genres are vaguely defined[11, 72, 81] and users’ perceptions thereof differ tremendously [15, 79]. Artists frequently cover several musicstyles throughout their career, where some tracks may be more favored than others for reasons includinglyrics quality, the influence of associated music videos, over-exposure, or associations with unpleasant personalexperiences [24]. Accordingly, the work at hand investigates music recommendations on the track level to reflectusers’ preferences in a more fine-grained manner than genre labels attributed to an artist’s overall repertoirecould do. • Fourth, while deep learning approaches are increasingly used for recommender systems in general and formusic recommendation in particular, the integration of geographic aspects—especially user country—with deeplearning for music recommendation is a particular asset of the work at hand. For instance, a recent survey ondeep learning-based recommender systems [7] reports that extant research mainly uses textual informationto capture context in approaches to context-aware recommender systems. The authors particularly considercontext that is extracted from items (e.g., text documents) instead of users.Preprint version] Listener Modeling and Context-aware Music Recommendation Based on Country Archetypes 5

In the following, we detail how we gather and process the dataset used in our study, which contains information aboutusers’ music listening behavior (Section 4.1). We then describe our approach to identify country clusters based on thisdataset (Section 4.2). Finally, we elaborate on our approaches to create user models incorporating country informationand we detail our neural network architecture that integrates these models (Section 4.3).

We base our investigations on the LFM-1b dataset [60], which we filter according to our requirements as detailed below.The LFM-1b dataset contains music listening information for 120,322 Last.fm users, totaling to 1,088,161,692 individuallistening events (LEs) generated between January 2005 and August 2014; the majority of LEs was created during years2012–2014. Each LE is characterized as a quintuple of user-id, artist-id, album-id, track-id, and timestamp. The averagenumber of LEs per user in the dataset is 8,879 (std. 15,962). For some users, also demographic data (country, age, andgender) is available in LFM-1b. More precisely, 46% of the users do provide information about their country, the samepercentage do provide information about their gender, and 62% about their age. The majority of users who provide theircountry are from the US (18.5%), followed by Russia (9.1%), Germany (8.3%), the UK (8.3%), Poland (8.0%), Brazil (7.0%),and Finland (2.6%). The mean age of the users who reveal it is 25.4 years (std. 9.4); the median age is 23 years. The agedistribution differs significantly between countries, though. In Figure 1, we show the age distribution for the countrieswith at least 100 users (47 countries), categorized into age groups. The youngest users are found in Estonia and Poland,while the oldest users are Swiss and Japanese. Among the users who indicate their gender, 72% are male and 28% arefemale. These percentages differ, however, considerably between countries. In Figure 2, we therefore depict the ratiosbetween genders, again for the top 47 countries in terms of number of users. While the Baltic countries Lithuania andLatvia have an almost equal share of male and female users, India and Iran show a very unequal distribution (around90% male users).As reported above, about 46% of users in the LFM-1b dataset disclose their country. For our country-specific analysis,we therefore only consider users (and their LEs) for whom country information is available. This results in a dataset of55,186 users, who have listened to a total of 26,021,362 unique tracks. The distribution of the number of LEs over tracksis visualized in Figure 3.We subsequently reduce the data to decrease noise originating from the user-generated nature of the metadata in theLFM-1b dataset (in particular, misspellings and ambiguities), i.e., we filter out tracks and countries. This noise wouldotherwise likely cause distortions in future steps of our approach. First, we drop tracks that have been listened toless than 1,000 times, globally, resulting in a total of 122,442 tracks to consider further. Second, to minimize possibledistortions caused by countries with a low number of LEs or a low number of unique users, we only consider countrieswith at least 80,000 LEs and at least 25 users. We chose these values as thresholds based on an empirical investigationof the distributions of LEs and of users over countries (cf. Figures 4 and 5, respectively). The former shows a flatcharacteristic around country-id 100, followed by a clear gap between country-id 110 and 111 (which corresponds to80,000 LEs). The latter reveals a sudden drop at country-id 70 (which corresponds to 25 users). Applying this countryfiltering eventually results in 70 unique countries and a total of 369,290,491 LEs, which represents only a small drop of [Preprint version as of September 22, 2020] Fig. 1. Distribution of age over countries. Countries are sorted in decreasing order of number of users from left to right.Fig. 2. Distribution of gender over countries. Countries are sorted in decreasing order of number of users from left to right.

To cluster countries according to their citizens’ listening behavior, it is important to first normalize the data of eachcountry to avoid distortions caused by different country sizes. To this end, we normalize each country’s feature vectorPreprint version] Listener Modeling and Context-aware Music Recommendation Based on Country Archetypes 7

Fig. 3. Distribution of number of listening events over all tracks (semi-log-scaled). Track identifiers are ordered by number of LEs.Fig. 4. Distribution of number of listening events over all countries (semi-log-scaled). Country identifiers are ordered by number ofLEs. [Preprint version as of September 22, 2020]

Fig. 5. Distribution of number of users over all countries (semi-log-scaled). Country id ordered by number of users. to sum up to one. We next apply truncated SVD/PCA [31], reducing the dimensionality of the feature vectors to 100,while still preserving 99.8% of the variance in the data. Taking these 100-dimensional feature vectors as an input to at-distributed Stochastic Neighbor Embedding (t-SNE) [78] and subsequently using OPTICS [4] enables us to visualizethe data and identify clusters of countries sharing similar music listening behaviors.T-SNE is a visualization technique that embeds high-dimensional data in a low-dimensional (typically, two-dimensional)visualization space, paying particular attention to preserving the local structure of the original data. It is particularlyuseful to disentangle data points that lie on more than one manifold. T-SNE represents proximities or affinities betweenpairs of data items by estimating the probability that the first data item will choose the second one as its nearestneighbor, and vice versa. In the original data space, this probability is modeled by means of a Gaussian distributioncentered around each data item in the high-dimensional space; in the visualization space by means of a t-studentdistribution centered around each data item in the low-dimensional space. Kullback-Leibler divergence of the jointdistributions between pairs of data points in the original space and in the visualization space is then minimized viagradient descent.OPTICS (Ordering Points To Identify the Clustering Structure) is a density-based clustering method that createsa linear ordering of data items based on their spatial proximity. For this purpose, OPTICS first identifies core datapoints that have at least a certain number of neighbors in their vicinity (the minimum cluster size) and assigns a coredistance to them, describing how dense the area around each core point is. Furthermore, a reachability distance between Please note that country-specific results may still be influenced by some users showing particularly high playcounts. Nevertheless, we decided againstexcluding or penalizing the listening information of such users just because users with a high playcount indicate a more pronounced inclination tolisten to music. Our reasoning is that users who contribute only few listening events to Last.fm should be considered less important to model theircountry-specific listening behavior than users who heavily contribute. In addition, removing such “power listeners” would distort the original distributionof usersâĂŹ playcounts in the sample. Reducing the dimensionality of the dataset to 50 dimensions preserves only 90.1% of the variance.

Preprint version] Listener Modeling and Context-aware Music Recommendation Based on Country Archetypes 9each pair of data items o and p is established, which is the maximum of (1) the distance between o and p and (2) thecore distance of o , whichever is bigger. Data items assigned to the same cluster have a lower reachability distanceto their nearest neighbors than items that belong to different clusters. OPTICS subsequently creates an ordering ofdata items in terms of their reachability distance and identifies sudden changes in reachability between neighboringitems, assuming that these correspond to cluster borders. The number of clusters is controlled by a parameter ξ thatdefines the minimum steepness (relative change in distance) between neighboring data items to be considered a clusterboundary. As for parameter optimization, we adopt a grid search strategy to identify a well-suited perplexity for t-SNE (5) anda minimum size of clusters, i.e., minimum number of data items in each cluster, for OPTICS (3). Please note that weuse ISO 3166 2-digit country codes to refer to countries in this article .For an analysis of the identified clusters in a way that enables the establishment of archetypes of music preferences,we adopt the following approach. As shown in Figure 3, we observe a long-tail distribution of listening events overtracks, which means that a few dominating tracks are listened to by a lot of users, while most tracks are only listened toby a few users. Thus, these dominating tracks will also be popular among the list of top-tracks per cluster, which makesit hard to distinguish between the clusters and to interpret their corresponding archetypes. To overcome this, we adapta scoring function similar to the inverse document frequency (IDF) [38] metric from the field of information retrieval,which assigns high scores to rarely occurring tracks and low scores to frequently occurring tracks. Formally, we defineIDF for each track t i as IDF ( t i ) = log Nn i , where N is the number of all listening events and n i is the number of LEsfor track t i . The distribution of IDF values of the top 50 tracks, in terms of IDF ( t i ) , is plotted in Figure 6. In an empiricalanalysis, we identify 10 overall dominating tracks using a threshold of 4.2 on the IDF values (see Figure 6). These tracksare Rolling in the Deep by Adele, Somebody That I Used to Know by Gotye, Islands and Intro by The xx, Blue Jeans byLana Del Rey, Supermassive Black Hole by Muse, Skinny Love by Bon Iver as well as Use Somebody, Sex on Fire andClose by Kings of Leon. We remove these tracks from further analyses when discussing archetypes as these are notsuited to discriminate between clusters.In our analysis of archetypes, we include genre annotations, which we obtain as follows. For all tracks in thedataset, we retrieve the top user-generated tags using the Last.fm API. Subsequently, we filter the tags of each trackusing a comprehensive list of music genres and styles from Spotify, called Spotify microgenres [37]. This list contains3,034 genre names (as of May 2019 when we extracted them), including umbrella genres such as pop and country, aswell as smaller niches such as Thai hip-hop, German metal, and discofox [37]. The fine-grained reflection of subtledifferences in microgenres provides a more particularized basis for describing the clusters, compared to the use of amore coarse-grained taxonomy of music genres. We note as a limitation that the microgenre categories are definedin a similarly vague manner as coarse-grained taxonomies of music genre [11, 72, 81]; and the semantics associatedwith (micro)genre names have evolved over time so that a precise definition appears difficult. Relying on a big corpusof data where microgenres are visualized and sonified (see The Every Noise project ), we nevertheless believe thatusing the concept of microgenres helps future research to build upon our work. Further note that we rely on thetop user-generated tags from the Last.fm community for attributing microgenres to tracks; the microgenre–track In this work, we use Euclidean distance as distance metric and set ξ = . . More precisely, we performed grid search on t-SNE perplexity in the range [1, 2, 3, 5, 10, 15, 20, 25, 30, 35, 40, 50] and on the minimum number of datapoints per cluster enforced by OPTICS in the range [2, 3, 4, 5], optimizing for average neighborhood preservation ratio (nearest neighbor consistency). http://everynoise.com Fig. 6. Inverse document frequency (IDF) scores for the top 50 tracks. associations, thus, reflect the Last.fm community’s understanding of microgenres, which may not be congruent withthe music experts’ understanding. Additionally, synonyms may be present in the user-generated tags and, thus, twodifferent tags could be used interchangeably to annotate the same tracks (e.g., “Rap” and “HipHop”).To allow interested readers to conduct further analyses of the identified clusters on a microgenre level, we releasethe full list of the top 20 tracks (and corresponding artists) per cluster, and we include—for each track and artist—allmicrogenre annotations (cf. “Accompanying Resources”).

We build our context-aware music recommendation approach on top of a variational autoencoder (VAE) model [39].VAEs are a type of autoencoders [43] that consist of an encoder, a decoder, and a loss function. In contrast to classicautoencoders, which learn encodings directly, VAEs learn the distribution of encodings using variational inference.Via sampling from the learned distribution, more representations of the same items can be generated given the sameamount of training data. Thus, VAEs can learn more complex items than classic autoencoders.We opted to extend the VAE architecture for collaborative filtering presented by Liang et al. [47] because in alarge-scale study conducted by Dacrema et al. [25], the approach followed by Liang et al. [47] was found the onlydeep neural network-based approach that outperformed equally well tuned non-deep-learning approaches. In addition,Liang et al. [47] evaluated their VAE architecture on the Million Song Dataset [12], a common benchmark in the musicdomain. They showed substantially superior performance compared to several baselines, in particular, the linear modelweighted matrix factorization (WMF) and collaborative denoising autoencoders (CDAE).As depicted in Figure ?? , we extend the VAE architecture by integrating context information using a gating mechanism.The gate output modulates the latent code in a way to incorporate context-based (country and cluster) differences ofPreprint version] Listener Modeling and Context-aware Music Recommendation Based on Country Archetypes 11users. The abstract concepts are weighted based on how important the models deem them for a specific user group. Specifically, we model users in form of a 122,442-dimensional listening vector (i.e., n _ tracks ), which represents theirtrack listening history, together with context information. We investigate four different ways to define a user’s context:(1) the user’s country, (2) the cluster membership of the user’s country, (3) the Euclidean distances between the user’slistening vector and all identified cluster centroids, and (4) the Euclidean distances between the user’s listening vectorto all country centroids.We derive context from the self-reported country of a user. For our VAE model with country context (i.e., model 1), aone-hot encoding of the 70 included countries is used, whereas for VAE with cluster context (i.e., model 2), context isdetermined by the user’s country membership in a cluster (see Table 1), resulting in a one-hot encoding of length 9. Forthe context models 3 and 4, we first calculate the cluster centroids, i.e., each track’s listening events of all users belongingto a cluster are summed and then normalized by the total amount of listening events across all tracks. Subsequently, foreach user, the Euclidean distances between the respective user’s normalized feature vector and all cluster centroidsare determined and used as context features for the VAE with cluster distances (i.e., model 3). Country distances arecalculated accordingly, where each country is considered as its own cluster (i.e., model 4). Taken together, n _ context is70 in case of model 1 and model 4, and 9 in case of model 2 and model 3. Cluster Countries

Table 1. Country clusters as determined by OPTICS with a minimum cluster size of 3, based on the output of a t-SNE visualization(perplexity of 5) on PCA-reduced country feature vectors (100 dimensions). Countries identified as too noisy by OPTICS are representedas Cluster -1.

Our recommendation approach assumes that each user can be represented by a latent k -dimensional multivariateGaussian, which is sampled, weighted by gates derived from context information, and transformed with a non-linearfunction to reconstruct the initial track listening history (cf. Figure ?? ). As mentioned before, our VAE model withoutcontextual features is based on the work of Liang et al. [47]. To integrate context models, we extend the VAE byadding a gating mechanism to feed in contextual information according to the four ways detailed above. In a two-layerfeed-forward neural network, the initial feature vector is encoded first into an intermediate representation enc k -dimensional multivariate Gaussian. The mean values µ and variance values σ are the outputs of theencoding network: We also run experiments in which we simply concatenate track listening history and context information, but this did not show improvements over theVAE based on just the listening history. enc = tanh (cid:0) W enc · t (cid:1) (1) µ = tanh (cid:16) W enc µ · enc (cid:17) (2) σ = tanh (cid:0) W enc σ · enc (cid:1) (3)We use tanh as a nonlinearity for all layers in the autoencoder. Based on our experiments (see Section 5.2.1), we setthe size of W enc to n _ tracks × W enc µ and W end σ to 1,200 × enc and 600 for the latent representation z . The user context, given by its input vector c is transformed by a denselayer with sigmoid nonlinearity into a context gate c дate of the same length as latent z . Next, the gate is applied withcomponent-wise multiplication to z : c дate = σ ( W context · c ) (4) ϵ ∼ N( , ) (5) z = ( µ + σ ⊙ ϵ ) ⊙ c дat e (6)The weighted latent representation is then decoded back into the original space by a network with mirroring sizebut different learned parameters of the encoder: dec = tanh (cid:0) W dec · z (cid:1) (7) ˆ t = tanh (cid:0) W dec · dec (cid:1) (8)The detailed data flow and computation in each layer is visualized in Figure ?? . Based on the known track historyof a target user, the models generate a variational distribution ˆ t . Top- k track recommendations are then retrieved byranking the mean values of this distribution. In the following, we present and interpret the results of our approach to identify country clusters and archetypes ofmusic listening preferences (Section 5.1) and of the music track recommendation experiments (Section 5.2). We furtherconnect the discussion to the initial research questions, which we answer in the context of the obtained results.

We present the identified clusters and discuss the relationship of the countries subsumed in each cluster beyondmusic preferences (Section 5.1.1), for instance, in terms of geographic proximity, linguistic similarities, and historicalbackground. Furthermore, we discuss differences in user characteristics such as the users’ gender, age, and their listeningpatterns in terms of playcounts. In Section 5.1.2, we describe the characteristics of the clusters with respect to musicpreferences, i.e, we detail the track preferences that characterize the corresponding music archetypes.Preprint version] Listener Modeling and Context-aware Music Recommendation Based on Country Archetypes 13

Using the approach described in Section 4.2, we can identify nine country clusters,which are presented in Table 1 and visualized in Figure 7. Cluster 0 contains Spain (ES), Portugal (PL), Italy (IT), Slovenia(SI), and Iceland (IS). Cluster 1 includes as many as nine countries: Belgium (BE), The Netherlands (NL), Austria (AT),Switzerland (CH), Germany (DE), Czech Republic (CZ), Slovakia (SK), Poland (PL), and Finland (FI). Cluster 2 refersto the United Kingdom (GB), Estonia (EE), and Japan (JP). Cluster 3 includes Australia (AU), New Zealand (NZ), theUnited States (US), Canada (CA), and the Philippines (PH). Cluster 4 refers to Chile (CL), Costa Rica (CR), Uruguay(UY), and Israel (IL). Cluster 5 contains Colombia (CO), Mexico (MX), Bulgaria (BG), and Greece (GR). Cluster 6 thefollowing countries: Romania (RO), Egypt (EG), Iran (IR), Turkey (TR), and India (IN). Cluster 7 is composed of Brazil(BR), Indonesia (ID), Vietnam (VN), and Malaysia (MY). Cluster 8 encompasses eight countries: Lithuania (LT), Latvia(LV), Ukraine (UA), Belarus (BY), Russia (RU), Moldova (MD), Kazakhstan (KZ), and Georgia (GE).

Fig. 7. Results of t-SNE (perplexity of 5) and OPTICS (minimum cluster size of 3) on country feature vectors. The left part shows thefull t-SNE output space, the right part a zoomed version onto the major clusters.

Four of the countries in Cluster 0 are geographically tied together, sharing national borders (i.e., Spain (ES), Portugal(PL), Italy (IT), and Slovenia (SI)). Only Iceland (IS) is geographically dislocated. Furthermore, Spain (ES), Portugal (PL),and Italy (IT) share their roots in Romanian language. Moreover, there is a Slovene minority in Italy (IT), which maylead to partly similar music preferences in Slovenia (SI) and Italy (IT).Cluster 1 contains nine countries. Belgium (BE) and the Netherlands (NL) are neighboring countries and share theofficial language spoken (note, Belgium (BE) has two official languages). Austria (AT), Switzerland (CH), and Germany(DE) share the German language (note, Switzerland (CH) has four official languages). Czech Republic (CZ) and Slovakia(SK) are not only neighboring countries, but actually formed one joint country until 1992. The languages spoken in theCzech Republic (CZ), Slovakia (SK), and Poland (PL)—a neighboring country to the former two—show strong linguistic4 [Preprint version as of September 22, 2020]similarities. Altogether, we can see that Belgium (BE), the Netherlands (NL), Austria (AT), Switzerland (CH), Germany(DE), Czech Republic (CZ), Slovakia (SK), and Poland (PL) are geographically connected, sharing national borders(cf. Figure 8). Only Finland (FI) is geographically disconnected from the other countries in this cluster.

Fig. 8. Countries in Cluster 1 on a map.

Cluster 2 delivers a highly surprising result because it contains three countries that are geographically far awayfrom each other without any linguistic similarities or close historical connections: the United Kingdom (GB), Estonia(EE), and Japan (JP). The United Kingdom (GB) and Estonia (EE) are located at the Northwest and the Northeastof Europe—thus, at the opposite borders of Europe; Japan (JP) is even almost 8,000 km farther east of Estonia (EE).Although this cluster contains only three countries, with Japan (JP) and the United Kingdom (GB), it embraces two ofthe largest music markets worldwide [73]. Interestingly, the United Kingdom (GB) is not part of Cluster 3 that includesmost English-speaking countries. Considering the age distribution (Figure 9) in the identified country clusters, we findthat Cluster 2 shows the highest average age with a relatively large span. Furthermore, Cluster 2 shows by far thehighest average playcount per user for the countries in this cluster (Figure 10). This indicates that users in this clusterare characterized as being ‘power listeners’. As the combination of countries in this cluster seems surprising, age andlistening intensity may be the hidden—though determining—aspects for the emergence of this cluster. Please note that observations concerning age relate to our sample of Last.fm users.

Preprint version] Listener Modeling and Context-aware Music Recommendation Based on Country Archetypes 15

Cluster0 Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7 Cluster8

Clusters A g e d i s t r i bu t i o n Fig. 9. Age distribution of users in the identified country clusters. While the oldest users can be found in Cluster 2, the youngest canbe found in Cluster 7.

The major connector of the countries in Cluster 3 is that they are all English-speaking countries: Australia (AU),New Zealand (NZ), United States (US), Canada (CA), and the Philippines (PH), where English is one of the two officiallanguages in both Canada (CA) and the Philippines (PH).Cluster 4 comprises the countries Chile (CL), Uruguay (UY), Costa Rica (CR), and Israel (IL). Both Chile (CL) andUruguay (UY) are located in South America and are connected by their language: Spanish. The official language inCosta Rica (CR) is Spanish as well; located in Middle America, the geographic distance to Chile (CL) and Uruguay (UY)is not far. Israel (IL), in contrast, is a country in the Middle East and is, thus, geographically disconnected from theother three countries in this cluster.Cluster 5 contains two Latin-American countries as well as two countries in Southeastern Europe. The Latin-Americancountries, i.e., Mexico (MX) and Colombia (CO), are both Spanish-speaking countries. With Mexico (MX) located in theSouthern part of North America and Colombia (CO) being part of South America, these are no neighboring countries,though. The two countries in Southeastern Europe, i.e., Bulgaria (BG) and Greece (GR), share a border. Thus, the clustercontains two country groups, which are geographically spread.The countries in Cluster 6 are geographically connected, centered around countries being part of the Middle East—Turkey (TR), Iran (IR), and Egypt (EG)—and flanked by Romania (RO), that has historical relations to the others due tothe Osman Empire, and India (IN), that is adjacent to the Middle East and, thus, shows a geographical proximity to theother countries in this cluster. Furthermore, all the countries in Cluster 6 are very diverse when it comes to the various(minority) languages spoken, which may also be reflected in music preferences. Considering the female/male ratio ofusers (Figure 11) in the identified country clusters, we find that Cluster 6 shows the most unevenly distributed ratioacross the countries in this cluster. Despite the wide span of female/male ratios in this cluster’s countries, Cluster 6 is6 [Preprint version as of September 22, 2020]

Cluster0 Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7 Cluster8

Clusters P l a y c o un t d i s t r i bu t i o n Fig. 10. Distribution of users’ average playcount in the identified country clusters. While the highest average playcount can be foundin Cluster 2, the lowest one can be found in Cluster 6. the cluster with the overall lowest female/male ratio compared to the other clusters. With respect to age (Figure 9),this cluster comprises rather young users in our sample of the Last.fm community (with the average age of users inthe Clusters 7 and 8 being even younger, though). Overall, with respect to age and gender, Cluster 6 seems to have adifferentiating profile compared to the other clusters. Furthermore, Cluster 6 shows by far the lowest average playcountper user (Figure 10). This low number could be the result of a listening pattern that is shaped by cultural aspects, butcould, for instance, also be the consequence of limited access to the resources (e.g., broadband Internet connection,streaming platforms operating in the respective countries, licenses for music content). Considering those and similaraspects is a fruitful path for future research.Cluster 7 covers three neighboring countries (with maritime borders) in the Southeast of Asia—Indonesia (ID),Vietnam (VN), and Malaysia (MY)—and Brazil (BR) in South America. The three countries in the Southeast of Asiahave many similarities, including common frames of reference in history, culture, and religion; also their nationallanguages are closely related. From a geographic perspective, Brazil (BR) appears being disconnected from the othercountries in this cluster. The connection of Brazil (BR) with Indonesia (ID) and Malaysia (MY) is that all three countrieshave formerly been Portuguese colonies [5]. Whether this historical connection is indeed also conclusive for similarmusic preferences is subject to further research. Referring back to Figure 9, where we plot the age distribution forthe identified country clusters, and Figure 11, where we plot the female/male ratio, we see that Cluster 7 shows thelowest average age and is close to the highest female/male ratio. Furthermore, the female/male ratio is very evenlydistributed in Cluster 7. We, thus, suspect that age and gender are the hidden factors construing this cluster or, at least,accentuating it.Preprint version] Listener Modeling and Context-aware Music Recommendation Based on Country Archetypes 17

Cluster0 Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7 Cluster8

Clusters . . . . . . . . F e m a l e / m a l e r a t i o d i s t r i bu t i o n Fig. 11. Female/male ratio distribution of users in the identified country clusters. We find that the female/male ratio is most unevenlydistributed in Cluster 6 and most evenly distributed in Cluster 7.

As can be seen from Figure 12, Cluster 8 comprises nine countries that are in geographical proximity: the Balticcountries Lithuania (LT) and Latvia (LV), the Russian Federation (RU), Ukraine (UA), Belarus (BY), Moldova (MD),Kazakhstan (KZ), and Georgia (GA). Besides being characterized by the geographic proximity, these countries share ahistory of having been part of the Russian empire. Russian is a major (or influential) language in all of the countries inthis cluster [18].Overall, we note that the country clusters show different characteristics with respect to age (Figure 9), gender(Figure 11), and average playcount per user (Figure 10). With respect to age, we find especially large differences betweenthe Clusters 2 and 7: While the highest average age can be found in Cluster 2, the lowest average age can be foundin Cluster 7. The female/male ratio is high in Cluster 7 and also evenly distributed. In contrast, the female/male ratiois most unevenly distributed in Cluster 6 with a high span of ratios across the countries in this cluster; and overall,the ratio is—in comparison to the other clusters—very low. With respect to the average playcount per user, it is alsothe Clusters 2 and 6 that show the largest differences: Among the users in Cluster 2 there seems to be a high ratio of‘power listeners’, whereas the average playcount of users in Cluster 6 is low in comparison. Overall, it can, thus, not berejected that those and similar aspects may be hidden factors that accentuate the differentiation between the clusters ormay even be indicative for the emergence of those clusters.

To address the question what characterizesthe various clusters in terms of music preferences, we use the approach described in Section 4.2 to identify the mostimportant tracks and genres for each cluster. Table 2 provides a list of the 10 tracks with the highest playcounts per8 [Preprint version as of September 22, 2020]

Fig. 12. Countries in Cluster 8 on a map. cluster (after the IDF-based filtering explained in Section 4.2) and their genre annotations ; for genre annotations, werely on the user-generated annotations retrieved from the Last.fm API and aligned with the Spotify microgenres, asdescribed in Section 4.2. These most important tracks define the music preference archetypes corresponding to eachcluster.The most popular tracks in Cluster 0 are mainly attributed to the microgenres indie rock and alternative rock. Sixtracks in the top 20 have indie rock as the most associated microgenre, three alternative rock. Eight of 20 tracks haveboth indie rock as well as alternative rock within their five most associated microgenres. All of the 19 tracks amongthe top 20 that have microgenres on track level (Si Te Quisieras Venir by the Los Planetas does not have microgenresassigned on a track level), are associated with indie rock or alternative rock; most of them with both. Only a fewtracks in later positions (thus, not in the top 10) deviate from these genres (e.g., Set Fire to the Rain by Adele rankson position 14 and is associated with the genres soul and pop, Hurt by Johnny Cash is on position 16 and is mainlyassociated with country and folk, or Get Lucky by Daft Punk feat. Pharrell Williams on the position 20 that is associatedwith electronic). With 5 of the 20 most frequently listened tracks in this cluster, the band Arctic Monkeys is particularlydominant in that cluster.While indie rock and alternative rock are represented in the most frequently listened tracks in Cluster 0 as well asCluster 1, the tracks in Cluster 1 differentiate insofar from those in Cluster 0 as there is a tendency that the tracksinclude pop or electronic elements (e.g., VCR by The xx associated with electronic and indie rock or Cosmic Love byFlorence + the Machine). Four tracks in the top 20 have indie pop as the most associated microgenre, 3 electronic. Tentracks in the top 20 have indie rock as well as alternative rock as tagged microgenres. For all tracks except Hurt by We released the full list of the top 20 tracks (and corresponding artists) per cluster and all microgenre annotations (for each track and artist) (cf.“Accompanying Resources”).

Preprint version] Listener Modeling and Context-aware Music Recommendation Based on Country Archetypes 19Johnny Cash and Lonely Day by System of a Down, pop is one of the tagged microgenres. Electronic is associated with9 of the 20 tracks.In Cluster 2, two tracks that are most associated with folk are among the most popular tracks in the cluster (e.g.,Little Lion Man or The Cave by Mumford & Sons). Among the top 20, there are 4 tracks associated mostly with folk.Tracks that are associated with electronic and pop (e.g., Judas by Lady Gaga) and tracks associated with triphop andelectronic (e.g., Teardrop by Massive Attack) are also strongly represented. We recall Figure 9 showing that Cluster 2has the highest average age in our sample of Last.fm users. The high average age of users in Cluster 2 and the tendencyto like folk music are in line with previous research that found that folk music is more established among older userscompared to younger ones [63, 65]. Yet, the results in Schedl and Ferwerda [65] suggest that the preference for folkmusic is more prevalent for female than for male users; this seems not to be fully in line with the characteristics ofCluster 7 at first sight because the female/male ratio in Cluster 2 is not particularly high (Figure 11). Delving deeper onthe track characteristics, though, we notice that previous works considered a rather coarse-grained taxonomy of genres,whereas the work at hand considers microgenres. Table 2 shows that the 10 most popular tracks in Cluster 2 reflect indierock (4 out of 10), alternative rock (3 out of 10), and (indie) folk (2 out of 10). In previous work [65], alternative (rock)was associated rather with male users (typically with younger users, though). So the indie and alternative element maysuggest a rather male audience.While the most listened song in Cluster 3 is associated with country (It Ain’t Cool To Be Crazy About You byGeorge Strait), this cluster shows a lot of tracks that are tagged with folk among the most popular ones for that cluster;4 of the top 20 have it as their most associated microgenre. The folk tracks are either associated with folk and thesinger/songwriter genre (e.g., Flume or Holocene by Bon Iver) or are attributed to indie folk (e.g., In the AeroplaneOver the Sea by Neutral Milk Hotel). Eleven tracks in the top 20 are associated with electronic or electronica within thetrack’s five most tagged microgenres.The most popular tracks in Cluster 4 are predominantly associated with progressive rock or alternative rock (e.g., 3Libras by A Perfect Circle). Within the top 20 of this cluster, 10 tracks are associated with some form of progressive rockand 2 with progressive metal, 14 with alternative rock, and 9 with some form of metal (i.e., progressive metal, alternativemetal, doom metal, or with the gernic term metal). An interesting deviation from the dominance of the rock genre isthe track World’s End by Hatsune Miku & Megurine Luka, who is a vocaloid and j-pop artist. Indeed, all playcounts forthat track are generated by a single user 9 from Chile (CL); thus, this track is not representative for Cluster 4. A furtherdeviation is constituted by Por la Ventana by Gepe associated with the genres folk and singer/songwriter, which islistened to by more than one user.The most popular tracks in Cluster 5 are mostly associated with the psychedelic rock genre. Interestingly, 11 of the20 most popular tracks are by the band Phase. An exception from the strong psychedelic rock representation in thiscluster is the track Slow Me Down by Anneke van Giersbergen, a track that is associated with the singer/songwritergenre, while the artist is mainly associated with alternative rock and metal, but also pop-rock.Cluster 6 is characterized by a dichotomy of genres among the most popular tracks. On the one hand, there aretracks associated with singer/songwriter and pop (e.g., If I Could and I Can’t Change by Sophie Zelmani). On theother hand, there is a strong representation of doom metal with tracks such as Without God and Day by Katatonia.Interestingly, both Sophie Zelmani as well as Katatonia are present with several songs among the most popular tracksin this cluster. Recalling the Figures 9, 11, and 10 that visualize the the user characteristics for the eight clusters, unevendistribution with respect to the female/male ratio and the generally low playcount per user (compared to the otherclusters), and the young age of its users may be characterizing aspects for Cluster 6 that result in this heterogeneous0 [Preprint version as of September 22, 2020]picture with singer/songwriter and pop tracks, on the one hand, and the strong representation of doom metal, on theother. For instance, Schedl and Ferwerda [65] found pop being more popular among female than male users, while it isthe opposite for metal. Interestingly, the results of Schedl and Ferwerda [65] (considering a global sample, also relyingon data from Last.fm) suggest that the age group in which the users of Cluster 7 range, is the age group that likes popleast of all analyzed age groups, and for liking of meta this age group ranges in the middle field.The only cluster that includes many popular tracks associated with the pop genre is Cluster 7. Tracks includeSkyscraper by Demi Lovato, Come & Get It by Selena Gomez, and Dark Paradise by Lana Del Rey. Next to the generictag pop (19 occurrences), the most mentioned microgenres among the top 20 in this cluster are poprock (16 occurrences)and indie pop (13 occurrences), followed by britpop (9), electro pop (6), dance pop (6), dream pop (4), synth pop (3),chamber pop (3), alternative pop (3), teen pop (2), art pop (2), power pop (1), jangle pop (1), and k-pop (1). The high ratioof female users (Figure 11) might be a cohesive characteristic in this cluster as already previous work has shown thatfemale users are more inclined to listen to pop music than male users, in particular in the Last.fm community [63, 65].Cluster 8 is characterized by the post-hardcore genre. Seven tracks in the top 20 in this cluster are tagged withpost-hardhore, five of those have it as their most tagged microgenre. Triphop (8 tracks), screamo (6 tracks), and hardcore(6 tracks) are also well represented among the top 20 in this cluster. Popular tracks include Another Bottle Down byAsking Alexandria, ...Meltdown by Enter Shikari, and Nineteen Fifty Eight by A Day to Remember. An interestingdeviation from this post-hardcore association are, for instance, Dexter by Ricardo Villalobos (minimal techno) andCookie Thumper! by Die Antwoord (hip hop), which are also among the most popular tracks in this cluster.Summarizing the answer to RQ1, which we addressed here (To what extent can we identify and interpret groups ofcountries that constitute music preference archetypes , from behavioral traces of users’ music listening records?), we findnine clusters of countries, with each of the clusters representing a music preference archetype that reflects differentnuances of music preferences in terms of the Spotify microgenres. While some music preference archetypes representcountries with geographical proximity (e.g., Cluster 6 and Cluster 8) and some archetypes share linguistic similarities(e.g., Cluster 3 and Cluster 8), others include interesting outliers (e.g, Iceland (IS) in Cluster 0, Israel (IL) for Cluster 4, orBrazil (BR) in Cluster 7).

In the following, we first detail the setup of the conducted evaluation experiments for the music track recommendationtask, including evaluation protocol, baselines, and performance metrics (Section 5.2.1). Subsequently, we report anddiscuss the obtained results ans answer the related research questions (Section 5.2.2).

After preselection and filtering (cf. Section 4.1), the dataset contains the listening histories of54,337 Last.fm users. To carry out the recommendation experiments, we split the data into training, validation, andtest sets. For each of validation and test set, 5,000 users are randomly sampled. The original VAE model [47] and ourextended VAE architecture that integrates the user context models described in Section 4.3 are trained on the fulllistening events of the uses in the training set. For users in the validation and test set, 80% of all listening events arerandomly selected to act as an input for the model, and the remaining 20% are used for evaluation. The NDCG@100metric (see below) on the validation set is used to select the hyperparameters of our models.

Baselines:

In addition to comparing our extended context-aware model to the original VAE recommendationarchitecture [47], we also include two baselines in the experiments, i.e., variants of most popular (MP) and implicitmatrix factorization (IMF). In the most popular (MP) models, a popularity measure is calculated for each track based onPreprint version] Listener Modeling and Context-aware Music Recommendation Based on Country Archetypes 21its sum of listening events across users in the training set. We implemented and evaluated three flavors of MP:

MPglobal computes the most popular tracks on a global scale (independent of country);

MP country considers only the toptracks in the country of the target user;

MP cluster considers only the top tracks within the cluster the country of thetarget user belongs to. We then rank tracks accordingly and use the ranking to produce recommendations, which areevaluated on the 20% split of the test set (for each user). To make results between the baseline and our proposed modelcomparable, we exclude tracks that are part of a user’s known listening history, i.e., listening events from the remaining80%. As a second baseline, we adopt a collaborative filtering approach using implicit matrix factorization (IMF) accordingto Koren et al. [42]. We use the implementation provided by Spotlight with random negative sampling (50:50), 128latent dimensions, and a pointwise loss function. Performance metrics:

To quantify the accuracy of the recommendations, we use the following metrics [similarto 3, 47, 69], which we report averaged over all users (in the test set). Thus, for each user in the test set, we generaterecommendations using the data in the training set and compare the recommended tracks with the actually listenedtracks of the user present in the test set in order to calculate the performance metrics. Note that we use the definitionscommon in recommender systems research, which are partly different from the ones traditionally used in informationretrieval.

Precision@K for user u : P @ K ( u ) = K K (cid:213) i = rel ( i ) , (9)where K is the number of recommended items and rel ( i ) is an indicator function signaling whether the recommendedtrack at rank i is relevant to u or not. This means that rel ( i ) = i can be found in thetest set; rel ( i ) = Recall@K for user u : R @ K ( u ) = ( K , N u ) K (cid:213) i = rel ( i ) (10)where N u is the number of items in the test set that are relevant to u , K is the size of the recommendation list, and rel ( i ) is the same indicator function as used for Precision@K . When comparing

Precision@K and

Recall@K , Precision@K can be seen as a measure of the usefulness of recommendations and

Recall@K as a measure of the completeness ofrecommendations.

Normalized discounted cumulative gain@K : N DCG @ K ( u ) = DCG @ K ( u ) IDCG @ K ( u ) (11)where IDCG @ K ( u ) is the ideal DCG @ K for user u , achieved when all items relevant to u are ranked at the top, and DCG @ K ( u ) is the discounted cumulative gain at position k for user u . It is given by: DCG @ K ( u ) = K (cid:213) i = rel ( i ) log ( i + ) (12) https://maciejkula.github.io/spotlight/factorization/implicit.html rel ( i ) is the same indicator function as used for Precision@K and

Recall@K . In contrast to those two performancemetrics,

NDCG@K is a ranking-based metric, which also takes the position of the recommended tracks into accountsince higher-ranked items are given more weight.We compute and report all metrics for K =

10 and K = Table 3 shows the performance achieved on the test set, averaged over all users in thetest set. As a general observation, we see that the VAE-based approaches outperform the baselines (MP and IMF) bya substantial extent. Of the baselines, IMF performs superior to MP global while the other two variants of MP (MPcountry and MP cluster) yield better results than IMF. The poor performance of MP global is somewhat surprising sinceseveral studies [e.g., 44, 76, 77] have shown that recommendation approaches leveraging popularity information—e.g.,always suggesting the items that are most frequently consumed—often achieve highly competitive accuracy valuesin offline experiments, despite the obvious fact that such recommendations will likely not be perceived very usefulby the users. A likely reason is that we perform track recommendation while the earlier mentioned works commonlyadopt an artist recommendation setup. In an artist recommendation scenario, it is very likely that a user has consumedevery highly popular artist at least once. This leads to a high performance of a popularity-based approach. In the trackrecommendation scenario adopted in the work at hand, the granularity of items (tracks vs. artists) is higher and—incomparison to the artist recommendation scenario—it is not necessarily the case that the most popular tracks have beenconsumed by most users at least once. Overall, a popularity-based approach may work well for artist recommendationbut less so for the more fine-grained track recommendation.On the other hand, we also note that the other two variants (MP country and MP country) achieve much betterresults than MP global, even outperforming the IMF approach. This might be explained by the more narrow but betteruser-tailored consideration of the country-specific mainstream (cf. Bauer and Schedl [9]), which is reflected in thecomputation of most popular tracks in the MP country and MP cluster models.Comparing the proposed context-aware extensions of the VAE recommendation architecture to the original VAE [47],we observe a clear improvement of all metrics when integrating the user context models. This improvement is achievedirrespective of the actual user model we adopt (models 1–4). Precision@10 increases by 3.4 percentage points (7.1%)from VAE to the best performing VAE context model (model 4) that leverages the distances between users and countrycentroids. Likewise, Precision@100 increases by 1.7 percentage points (5.5%). Recall@10 and Recall@100 improve,respectively, by a maximum of 3.5 percentage points (7.2%), realized by model 4, and by 1.8 percentage points (4.9%),realized by model 2. In terms of NDCG, the largest gains are realized by VAE context model 2 that incorporates clusterids. NDCG@10 improves by 3.7 percentage points (7.4%) compared to VAE; NDCG@100 increases by 2.1 percentagepoints (5.5%).We investigate statistical significance of the results as follows. For all used metrics (i.e., P@10, P@100, R@10, R@100,NDCG@10, NDCG@100), data is non-normally distributed (Kolmogorov-Smirnov test, p ≤ . p ≤ . p ≤ .

05. Furthermore, weperform a pairwise comparison, again using Wilcoxon’s signed-rank test, for each metric and each pair of pure VAE andone of the models integrating context information (i.e., models 1–4). For each metric and each of the models 1–4, thePreprint version] Listener Modeling and Context-aware Music Recommendation Based on Country Archetypes 23models 1–4 outperform the pure VAE (without context integration) at a significance level of p ≤ .

05. Yet, the Friedmantest did not indicate any significant differences of the models 1–4 for any of the metrics.Returning to the original research questions, we answer RQ2 (Which are effective ways to model the users’ geographicbackground as a contextual factor for music recommendation?) by pointing to the fact that all four user models proposedare effective to significantly improve recommendation quality in terms of precision, recall, and NDCG measures. Wenote, however, that performance differences between the four user context models are largely negligible. In summary,leveraging country information for music track recommendation (either as country or cluster identifier, or as distancesbetween the target user and each cluster’s centroid) is beneficial compared to not including any country information.As for RQ3 (How can we extend a state-of-the-art recommendation algorithm to include user context information,in particular, our geo-aware user models?), we proposed an extension of a state-of-the-art recommender based ona VAE architecture [47], i.e., we devised a multi-layer generative model in which contextual features can influencerecommendations through a gating mechanism.To investigate the generalizability of our findings to a dataset with different characteristics, we perform an additionalexperiment as follows. We estimate performance on a more diverse dataset in terms of track popularity than the onethat considers only the top 122,442 tracks. More precisely, we create a second dataset by first considering all tracks thathave been listened to as least 100 (instead of 1,000) times, yielding 1,012,961 unique tracks. We then randomly sample,three times, exactly the same amount of tracks (122,442) as used in our main experiment, and we evaluate the VAEapproaches on each randomly sampled subset, averaging performance measures across the three runs. Results canbe found in the five last rows of Table 3 (models named “VAE sampling ...”). While we observe an obvious decreasein performance when considering items further down the popularity scale, results are still in line with the findingsobtained on the main dataset. In particular, our extended VAE models (models 1–4) still outperform the original VAEarchitecture, with respect to all performance metrics. In summary , we approached the task of identifying country clusters and corresponding archetypes of music con-sumption preferences based on behavioral data of music listening that originates from Last.fm users. Together withthe users’ self-disclosed country information, we used the listening data (369 million listening events created by 54thousand Last.fm users) as an input to unsupervised learning techniques (t-SNE and OPTICS), allowing us to identifynine archetypal country clusters . We discussed these clusters in detail with respect to their corresponding users’ musicpreferences on the track level and the linguistic, historical, and cultural backgrounds of the countries in each cluster.Additionally, we considered the distribution of age, gender, and average playcount per user as aspects in our analysis.Furthermore, we proposed a context-aware music recommendation approach operating on the music track level,which integrates different user models that are based on the user’s country or country cluster. To this end, we extendeda variational autoencoder (VAE) architecture by a gating mechanism to add contextual user features . We considered fouruser models , either encoding the target user’s country information (model 1) or cluster information (model 2) directly, oras a feature vector containing the distances between the target user and all cluster centroids (model 3) or all individualcountry centroids (model 4). In evaluation experiments, using precision, recall, and NDCG as performance metrics, weshowed that all VAE architectures outperformed a popularity-based recommender and implicit matrix factorization, Please note that computational limitations prevented us from running experiments on all 1,012,961 tracks, even more so on the entire LFM-1b dataset. superior performance of all VAE variants that include contextinformation vis-à-vis VAE without context information, regardless of how country information is encoded in the usermodel.Yet, this work has potential limitations with respect to the underlying dataset, which we discuss in the following.There are social patterns that define how and why people access music [48]. A dataset containing logs of the interactionswith an online platform can, thus, only capture those listening events of people using any form of online music platform.According to López-Sintas et al. [48], music access patterns are structured by an individual’s social position (indicatedby education) and life stage (indicated by age). A bias with respect to the users’ social background can therefore beexpected for our dataset. For instance, the dataset has a strong community bias towards users in the United States (US),while other countries are less represented. Furthermore, user information is self-reported by the users, which may beprone to errors and may not necessarily reflect the truth. For instance, some users report as their country Antarctica(AQ) or a birth year of 1900, which both do not seem overly plausible—especially in combination [also see Figure 1 in61]. Moreover, some users show very high playcounts for single tracks , which are not popular among other users. Thisalso affects six of the tracks presented in our discussion of the music preference archetypes. For instance, World’s Endby Hatsune Miku & Megurine Luka has a playcount of 1,228 generated by a single unique user. Similarly, One Thing’by Runrig and Resemnare by Valeriu Sterian both have exactly one unique listener, who generated a playcount of 4,000and 3,591, respectively. The track Ariane by Nova has 3 unique users; I Can’t Change [New Song] and To Know You(Alt. Version)—both by Sophie Zelmani—have 5 unique users each, whereof almost all playcounts were generated byonly one single user. For both songs, this is the same user. Notably, also the preferences of the Last.fm users in ourdataset towards certain genres differ from the genre preferences of the population at large. For instance, we foundthat rap and R&B as well as classical music is substantially underrepresented in Last.fm listening data [68], which weuse in the present study. To some extent, these limitations related to the dataset could be alleviated in the future byperforming further data cleansing and preprocessing steps, e.g., threshold-based filtering of exorbitant playcounts by aminority of listeners.Another limitation of the work concerns a characteristic of t-SNE, which is that the cost function t-SNE uses isnon-convex . This, in turn, may result in a different embedding of data points in the low-dimensional output space whenthe t-SNE algorithm is run on different software or hardware configurations. Please note that this does not onlyconcern the present work, but potentially the entire (large) body of research that employs t-SNE for visualization. It is,however, an aspect that is barely discussed. We address this issue in the current work by providing exact details onour implementation and used software, and by releasing to the public the source code, parameter configurations, anddataset used in our experiments (cf. “Accompanying Resources” below).In this work, we used simple mechanisms to integrate country information as context factors into a VAE architecture.While they worked out well, i.e., outperformed a non-context-aware VAE, we expect even better performance withother user models, whose creation will be part of future research . For instance, we contemplate using probabilisticmodels to describe the likelihood of each user to belong to each cluster (or country), e.g., via Gaussian mixture models.Given the actual country of a user, we could then analyze in more detail users whose stated country is not the countrywith highest probability. Such a framework could also be used to diversify recommendations according to a user-selectedcountry, fulfilling user intents such as “I want music of my preferred genre, but listened to by Brazilians”. Note that our results are stable for a given machine, software configuration, and parameter setting since we fixed the seed of the random numbergenerator. Running the code on other configurations, however, may result in a slightly different visualization and clustering.

Preprint version] Listener Modeling and Context-aware Music Recommendation Based on Country Archetypes 25Furthermore, it would be worthwhile to compare the clustering and recommendation results we achieved here onthe track level to results achieved when modeling music preferences on the artist level, keeping all other method-ological details the same. In particular, since previous studies have predominantly shown that popularity-based musicrecommendation systems perform well when recommending artists, such a comparison could be enlightening.Finally, we aim at delving into the possible cultural, historical, or socio-economic reasons that may underlie thedifferences in music preferences between the identified archetypes. To this end, we will consider theories and insightsfrom cultural sciences, history, sociology, and economics, and connect our music preference archetypes to these theories.Another promising path for further analysis of the country clusters is to consider dimensions rooted in the music marketor the music content itself, including considerations such as as local demand, production of music styles, reception ofmusic styles, diffusion, etc., as well as dimensions related to the users’ listening habits.

ACCOMPANYING RESOURCES

To foster reproducibility, we release the code and data used in this work to the public. The code can be found onhttps://gitlab.cp.jku.at/markus/fiai2020_country_clusters; the dataset [64] is available from https://zenodo.org/record/3907362

FUNDING

This research is supported by the Austrian Science Fund (FWF): V579 and the Know-Center GmbH (FFG COMETfunding).

ACKNOWLEDGMENTS

The authors would like to thank Peter Müllner from the Know-Center GmbH for providing the IDF calculations of themusic tracks.

REFERENCES [1] Adiyansjah, Alexander A S Gunawan, and Derwin Suhartono. 2019. Music Recommender System Based on Genre using Convolutional RecurrentNeural Networks.

Procedia Computer Science

157 (2019), 99–109. https://doi.org/10.1016/j.procs.2019.08.146 The 4th International Conference onComputer Science and Computational Intelligence (ICCSCI 2019): Enabling Collaboration to Escalate Impact of Research Results for Society.[2] Gediminas Adomavicius and Alexander Tuzhilin. 2015. Context-Aware Recommender Systems. In

Recommender Systems Handbook (2nd ed.),Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor (Eds.). Springer, 191–226.[3] Fabio Aiolli. 2013. Efficient Top-n Recommendation for Very Large Scale Binary Rated Datasets. In

Proceedings of the 7th ACM Conference onRecommender Systems (Hong Kong, China) (RecSys ’13) . ACM, New York, NY, USA, 273–280. https://doi.org/10.1145/2507157.2507189[4] Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, and Jörg Sander. 1999. OPTICS: Ordering Points to Identify the Clustering Structure. In

Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (Philadelphia, Pennsylvania, USA) (SIGMOD ’99) . ACM, NewYork, NY, USA, 49–60. https://doi.org/10.1145/304182.304187[5] Ferdinand Bada. 2018. Former Portuguese Colonies.

WorldAtlas (2018). https://worldatlas.com/articles/former-portuguese-colonies.html.[6] Young Min Baek. 2015. Relationship Between Cultural Distance and Cross-Cultural Music Video Consumption on YouTube.

Social Science ComputerReview

33, 6 (2015), 730–748. https://doi.org/10.1177/0894439314562184[7] Zeynep Batmaz, Ali Yurekli, Alper Bilge, and Cihan Kaleli. 2019. A review on deep learning for recommender systems: challenges and remedies.

Artificial Intelligence Review

52, 1 (01 Jun 2019), 1–37. https://doi.org/10.1007/s10462-018-9654-y[8] Christine Bauer and Markus Schedl. 2018. On the importance of considering country-specific aspects on the online-market: an example of musicrecommendation considering country-specific mainstream. In (Waikoloa, Big Island, HI,USA, 3–6 January 2018). 3647–3656. http://hdl.handle.net/10125/50349 http://hdl.handle.net/10125/50349. [9] Christine Bauer and Markus Schedl. 2019. Global and country-specific mainstreaminess measures: Definitions, analysis, and usage for improvingpersonalized music recommendation systems.

PLOS ONE

14, 6 (06 2019), 1–36. https://doi.org/10.1371/journal.pone.0217389[10] Christine Bauer and Eva Zangerle. 2019. Leveraging Multi-Method Evaluation for Multi-Stakeholder Settings. In (Copenhagen, Denmark, 19 September) (ImpactRS ’19, Vol. 2462) , Oren Sar Shalom, Dietmar Jannach, and Ido Guy (Eds.). Ceur-ws.org. http://ceur-ws.org/Vol-2462/short3.pdf http://ceur-ws.org/Vol-2462/short3.pdf.[11] David Beer. 2013. Genre, Boundary Drawing and the Classificatory Imagination.

Cultural Sociology

7, 2 (2013), 145–160. https://doi.org/10.1177/1749975512473461 arXiv:https://doi.org/10.1177/1749975512473461[12] Thierry Bertin-Mahieux, Daniel P. W. Ellis, Brian Whitman, and Paul Lamere. 2011. The Million Song Dataset. In

Proceedings of the 12th InternationalSociety for Music Information Retrieval Conference, ISMIR 2011, Miami, Florida, USA, October 24-28, 2011 , Anssi Klapuri and Colby Leider (Eds.).University of Miami, 591–596. http://ismir2011.ismir.net/papers/OS6-1.pdf[13] Arielle Bonneville-Roussy, P. J. Rentfrow, M. K. Xu, and J. Potter. 2013. Music through the ages: Trends in musical engagement and preferences fromadolescence through middle adulthood.

Journal of Personality and Social Psychology

Musicae Scientiae

22, 2 (2018), 175–195. https://doi.org/10.1177/1029864917704016[15] Romain Brisson and Renzo Bianchi. 2019. On the relevance of music genre-based analysis in research on musical tastes.

Psychology of Music (2019).https://doi.org/10.1177/0305735619828810 arXiv:https://doi.org/10.1177/0305735619828810[16] Bethany Bryson. 1997. What about the univores? Musical dislikes and group-based identity construction among Americans with low levels ofeducation.

Poetics

25, 2-3 (1997), 141–156.[17] Oliver Budzinski and Julia Pannicke. 2017. Do preferences for pop music converge across countries?: Empirical evidence from the Eurovision SongContest.

Creative Industries Journal

10, 2 (2017), 168–187. https://doi.org/10.1080/17510694.2017.1332451[18] Central Intelligence Agency. 2019. Languages.

The World Factbook (Shinjuku, Tokyo, Japan) (SIGIR ’17) . ACM, New York,NY, USA, 655–664. https://doi.org/10.1145/3077136.3080772[20] A. Colley. 2008. Young people’s musical taste: Relationship with gender and gender-related traits.

Journal of Applied Social Psychology

38, 8 (2008),2039–2055. https://doi.org/10.1111/j.1559-1816.2008.00379.x[21] Philippe Coulangeon. 2003. La stratification sociale des goûts musicaux. Le modèle de la légitimitéculturelle en question.

Revue Française deSociologie

44, 1 (2003), 3–33. https://doi.org/10.3917/rfs.441.0003[22] Philippe Coulangeon. 2005. Social Stratification of Musical Tastes: Questioning the Cultural Legitimacy Model.

Revue Française de Sociologie

46 (52005), 123–154. https://doi.org/10.3917/rfs.465.0123[23] Philippe Coulangeon and Ionela Roharik. 2005. Testing the “Omnivore/Univore” Hypothesis in a Cross-National Perspective. On the Social Meaningof Ecletism in Musical Tastes. In

The Summer Meeting of the ISA RC28, UCLA . https://hal-sciencespo.archives-ouvertes.fr/hal-01053502[24] Sally Jo Cunningham, J. Stephen Downie, and David Bainbridge. 2005. “The Pain, The Pain": Modelling Music Information Behavior And The SongsWe Hate. In

Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR 2005) . London, UK, 474–477.[25] Maurizio Ferrari Dacrema, Paolo Cremonesi, and Dietmar Jannach. 2019. Are We Really Making Much Progress? A Worrying Analysis of RecentNeural Recommendation Approaches. In

Proceedings of the 13th ACM Conference on Recommender Systems (Copenhagen, Denmark) (RecSys âĂŹ19) .Association for Computing Machinery, New York, NY, USA, 101âĂŞ109. https://doi.org/10.1145/3298689.3347058[26] Paul DiMaggio. 1982. Cultural entrepreneurship in nineteenth-century Boston: the creation of an organizational base for high culture in America.

Media, Culture & Society

4, 1 (1982), 33–50. https://doi.org/10.1177/016344378200400104 arXiv:https://doi.org/10.1177/016344378200400104[27] Ulrich Dolata. 2013.

The transformative capacity of new technologies: A theory of sociotechnical change . Routledge Advances in Sociology, Vol. 96.Routledge, London, United Kingdom.[28] Timothy C.G. Fisher and Stephen B. Preece. 2003. Evolution, extinction, or status quo? Canadian performing arts audiences in the 1990s.

Poetics

J. Amer. Statist. Assoc.

Review of Economic Research on Copyright Issues

15, 1 (2018), 20–37. https://ssrn.com/abstract=3243751[31] Nathan Halko, Per-Gunnar Martinsson, and Joel A. Tropp. 201. Finding structure with randomness: Probabilistic algorithms for constructingapproximate matrix decompositions.

SIAM Rev.

53 (201), 217–288. https://doi.org/10.1137/090771806[32] John F Helliwell, Peter RG Layard, and Jeffrey Sachs. 2016.

World happiness report 2016 update . Sustainable Development Solutions Network.[33] Geert Hofstede, Gert Jan Hofstede, and Michael Minkov. 2005.

Cultures and organizations: Software of the mind . Vol. 2. McGraw-Hill, New York, NYUSA.[34] Morris B. Holbrook, Michael J. Weiss, and John Habich. 2002. Disentangling Effacement, Omnivore, and Distinction Effects on the Consumption ofCultural Activities: An Illustration.

Marketing Letters

13, 4 (2002), 345–357. https://doi.org/10.1023/A:1020322600709

Preprint version] Listener Modeling and Context-aware Music Recommendation Based on Country Archetypes 27 [35] Max Horkheimer and Theodor W Adorno. 1972.

Dialectic of Enlightenment . Seabury Press New York, New York, NY, USA.[36] Brian J. Hracs, Michael Seman, and Tarek E. Virani. 2017.

The Production and Consumption of Music in the Digital Age . Routledge, New York, NY,USA.[37] Maura Johnston. 2018. How Spotify Discovers the Genres of Tomorrow. (2018). https://artists.spotify.com/blog/how-spotify-discovers-the-genres-of-tomorrow.[38] Karen Spärck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval.

Journal of Documentation

28 (1972), 11–21.[39] Michael I Jordan, Zoubin Ghahramani, Tommi S Jaakkola, and Lawrence K Saul. 1999. An introduction to variational methods for graphical models.

Machine learning

37, 2 (1999), 183–233.[40] Tally Katz-Gerro. 1999. Cultural Consumption and Social Stratification: Leisure Activities, Musical Tastes, and Social Location.

SociologicalPerspectives

42, 4 (1999), 627–646. https://doi.org/10.2307/1389577 arXiv:https://doi.org/10.2307/1389577[41] Tally Katz-Gerro. 2002. Highbrow Cultural Consumption and Class Distinction in Italy, Israel, West Germany, Sweden, and the United States.

SocialForces

81, 1 (09 2002), 207–229. https://doi.org/10.1353/sof.2002.0050 arXiv:https://academic.oup.com/sf/article-pdf/81/1/207/6519885/81-1-207.pdf[42] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems.

Computer

42, 8 (Aug. 2009),30–37. https://doi.org/10.1109/MC.2009.263[43] Mark A Kramer. 1991. Nonlinear principal component analysis using autoassociative neural networks.

AIChE journal

37, 2 (1991), 233–243.[44] Chin-Hui Lai, Shin-Jye Lee, and Hung-Ling Huang. 2019. A social recommendation method based on the integration of social relationship andproduct popularity.

International Journal of Human-Computer Studies

121 (2019), 42–57. https://doi.org/10.1016/j.ijhcs.2018.04.002[45] Jin Ha Lee and Xiao Hu. 2014. Cross-cultural Similarities and Differences in Music Mood Perception. In iConference 2014 . iSchools, 249–269.https://doi.org/10.9776/14081[46] Lawrence W Levine. 1988.

Highbrow/lowbrow: the emergence of cultural hierarchy in America . Harvard University Press, Cambridge, MA, USA.[47] Dawen Liang, Rahul G. Krishnan, Matthew D. Hoffman, and Tony Jebara. 2018. Variational Autoencoders for Collaborative Filtering. In

Proceedingsof the 2018 World Wide Web Conference (Lyon, France) (WWW ’18) . International World Wide Web Conferences Steering Committee, Republic andCanton of Geneva, Switzerland, 689–698. https://doi.org/10.1145/3178876.3186150[48] Jordi López-Sintas, Àngel Cebollada, Nela Filimon, and Abaghan Gharhaman. 2014. Music access patterns: A social interpretation.

Poetics

46 (2014),56–74. https://doi.org/10.1016/j.poetic.2014.09.003[49] Joshua L. Moore, Thorsten Joachims, and Douglas Turnbull. 2014. Taste Space Versus the World: An Embedding Analysis of Listening Habits andGeography. In

Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR 2014) . Taipei, Taiwan.[50] Steven J. Morrison and Steven M. Demorest. 2009. Cultural constraints on music perception and cognition.

Progress in Brain Research

178 (2009),67–77. https://doi.org/10.1016/S0079-6123(09)17805-6[51] Massimiliano Nuccio, Marco Guerzoni, and Tally Katz-Gerro. 2018. Beyond Class Stratification: The Rise of the Eclectic Music Consumer in theModern Age.

Cultural Sociology

12, 3 (2018), 343–367. https://doi.org/10.1177/1749975518786039 arXiv:https://doi.org/10.1177/1749975518786039[52] Richard A. Peterson and Roger M. Kern. 1996. Changing Highbrow Taste: From Snob to Omnivore.

American Sociological Review

Cultivating differences: Symbolic boundariesand the making of inequality . Vol. 152. University of Chicago Press, Chicago, IL, USA, Chapter 7, 152–186.[54] Michael Pichl, Eva Zangerle, Günther Specht, and Markus Schedl. 2017. Mining Culture-Specific Music Listening Behavior from Social Media Data.In . IEEE, 208–215. https://doi.org/10.1109/ISM.2017.35[55] P. J. Rentfrow and S. D. Gosling. 2003. The do re mi’s of everyday life: the structure and personality correlates of music preferences.

Journal ofPersonality and Social Psychology

84, 6 (2003), 1236–1256. https://doi.org/10.1037/0022-3514.84.6.1236[56] Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor (Eds.). 2015.

Recommender Systems Handbook (2nd ed.). Springer.[57] Gabriel Rossman and Richard A. Peterson. 2015. The instability of omnivorous cultural taste over time.

Poetics

52 (2015), 139–153. https://doi.org/10.1016/j.poetic.2015.05.004[58] Diego Sánchez-Moreno, Ana B. Gil González, M. Dolores Muñoz Vicente, Vivian F. López Batista, and María N. Moreno García. 2016. A collaborativefiltering method for music recommendation using playing coefficients for artists and users.

Expert Systems with Applications

66 (2016), 234–244.https://doi.org/10.1016/j.eswa.2016.09.019[59] Thomas Schäfer and Claudia Mehlhorn. 2017. Can personality traits predict musical style preferences? A meta-analysis.

Personality and IndividualDifferences

116 (2017), 265–273. https://doi.org/10.1016/j.paid.2017.04.061[60] Markus Schedl. 2016. The LFM-1B Dataset for Music Retrieval and Recommendation. In

Proceedings of the 2016 ACM on International Conference onMultimedia Retrieval (New York, New York, USA) (ICMR ’16) . ACM, New York, NY, USA, 103–110. https://doi.org/10.1145/2911996.2912004[61] Markus Schedl. 2017. Investigating country-specific music preferences and music recommendation algorithms with the LFM-1b dataset.

InternationalJournal of Multimedia Information Retrieval

6, 1 (March 2017), 71–84. https://doi.org/10.1007/s13735-017-0118-y[62] Markus Schedl. 2019. Deep Learning in Music Recommendation Systems.

Frontiers in Applied Mathematics and Statistics (Como, Italy, 27 August) (KidRec ’17) . arXiv:1912.11564 [cs.IR] https://arxiv.org/abs/1912.11564 [64] Markus Schedl, Christine Bauer, Wolfgang Reisinger, Dominik Kowald, and Elisabeth Lex. 2020.

The dataset used in the article “Listener Modeling andContext-aware Music Recommendation Based on Country Archetypes” . https://doi.org/10.5281/zenodo.3907362[65] Markus Schedl and Bruce Ferwerda. 2017. Large-Scale Analysis of Group-Specific Music Genre Taste from Collaborative Tags. In (Taichung, Taiwan, 11–13 December) (ISM ’17) . IEEE, 479–482. https://doi.org/10.1109/ISM.2017.95[66] Markus Schedl, Peter Knees, Brian McFee, Dmitry Bogdanov, and Marius Kaminskas. 2015. Music Recommender Systems. In

Recommender SystemsHandbook (2nd ed.), Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor (Eds.). Springer, 453–492.[67] Markus Schedl, Florian Lemmerich, Bruce Ferwerda, Marcin Skowron, and Peter Knees. 2017. Indicators of Country Similarity in Terms ofMusic Taste, Cultural, and Socio-economic Factors. In . IEEE, 308–311. https://doi.org/10.1109/ISM.2017.55[68] Markus Schedl and Marko Tkalcic. 2014. Genre-based Analysis of Social Media Data on Music Listening Behavior: Are Fans of Classical MusicReally Averse to Social Media?. In

Proceedings of the First International Workshop on Internet-Scale Multimedia Management, WISMM ’14, Orlando,Florida, USA, November 7, 2014 , Roger Zimmermann and Yi Yu (Eds.). ACM, 9–13. https://doi.org/10.1145/2661714.2661717[69] Markus Schedl, Hamed Zamani, Ching-Wei Chen, Yashar Deldjoo, and Mehdi Elahi. 2018. Current challenges and visions in music recommendersystems research.

International Journal of Multimedia Information Retrieval

7, 2 (June 2018), 95–116. https://doi.org/10.1007/s13735-018-0154-2[70] A. Singhi and D. G. Brown. 2014. On Cultural, Textual and Experiential Aspects of Music Mood. In

Proceedings of the 15th International Society forMusic Information Retrieval Conference (ISMIR ’14) . Taipei, Taiwan, 3–8.[71] Marcin Skowron, Florian Lemmerich, Bruce Ferwerda, and Markus Schedl. 2017. Predicting Genre Preferences from Cultural and Socio-EconomicFactors for Music Retrieval. In

Advances in Information Retrieval , Joemon M Jose, Claudia Hauff, Ismail Sengor Altıngovde, Dawei Song, DyaaAlbakour, Stuart Watt, and John Tait (Eds.). Springer International Publishing, Cham, Germany, 561–567.[72] John Sonnett. 2016. Ambivalence, indifference, distinction: A comparative netfield analysis of implicit musical boundaries.

Poetics

Topics in Cognitive Science

4, 4 (2012), 653–667.https://doi.org/10.1111/j.1756-8765.2012.01215.x[75] T.F.M. ter Bogt, M.J.M.H. Delsing, M. van Zalk, P.G. Christenson, and W.H.J. Meeus. 2011. Intergenerational continuity of taste: parental andadolescent music preferences.

Social Forces

90, 1 (2011), 297–319.[76] Sunita Tiwari, Manjeet Singh Pangtey, and Sushil Kumar. 2018. Location Aware Personalized News Recommender System Based on TwitterPopularity. In

Computational Science and Its Applications – ICCSA 2018 , Osvaldo Gervasi, Beniamino Murgante, Sanjay Misra, Elena Stankova,Carmelo M. Torre, Ana Maria A.C. Rocha, David Taniar, Bernady O. Apduhan, Eufemia Tarantino, and Yeonseung Ryu (Eds.). Springer, Cham,650–658.[77] Andreu Vall, Massimo Quadrana, Markus Schedl, and Gerhard Widmer. 2019. Order, context and popularity bias in next-song recommendations.

International Journal of Multimedia Information Retrieval

8, 2 (June 2019), 101–113. https://doi.org/10.1007/s13735-019-00169-8[78] Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data using t-SNE.

Journal of Machine Learning Research

Poetics

37, 4 (2009), 315–332. https://doi.org/10.1016/j.poetic.2009.06.005[80] G. Vigliensoni and I. Fujinaga. 2016. Automatic music recommendation systems: do demographic, profiling, and contextual features improve theirperformance?. In . 94–100.[81] Jef Vlegels and John Lievens. 2017. Music classification, genres, and taste patterns: A ground-up network analysis on the clustering of artistpreferences.

Poetics

60 (2017), 76–89. https://doi.org/10.1016/j.poetic.2016.08.004[82] Frank Wilcoxon. 1945. Individual Comparisons by Ranking Methods.

Biometrics Bulletin

Proceedings of the 26th Conference on User Modeling,Adaptation and Personalization (UMAP 2018) . ACM, Singapore, Singapore, 357–358. https://doi.org/10.1145/3209219.3209258

Preprint version] Listener Modeling and Context-aware Music Recommendation Based on Country Archetypes 29

Cluster no. Track title Artist Playcount within cluster Track genres0 Mr. Brightside The Killers 4248 rock, indie rock, alternative rockUprising Muse 3955 alternative rock, rock, progressive rockI Bet You Look Good on the Dancefloor Arctic Monkeys 3835 indie rock, rock, alternative rockFluorescent Adolescent Arctic Monkeys 3772 indie rock, rock, alternative rockVCR The xx 3597 electronic, indie rock, indie popReptilia The Strokes 3394 indie rock, rock, alternative rockMardy Bum Arctic Monkeys 3345 indie rock, rock, alternative rockHoppípolla Sigur RÃşs 3336 post-rock, ambient, post-rockThere Is a Light That Never Goes Out The Smiths 3289 new wave, rock, brit-popTeardrop Massive Attack 3260 triphop, electronic, downtempo1 Set Fire to the Rain Adele 20460 soul, pop, singer/songwriterLittle Lion Man Mumford & Sons 20160 folk, indie folk, banjoOtherside Red Hot Chili Peppers 19469 rock, alternative rock, funkRadioactive Imagine Dragons 19338 rock, indie rock, alternative rockVCR The xx 19220 electronic, indie rock, indie popHeart Skipped a Beat The xx 19004 electronic, indie rock, rockTeardrop Massive Attack 18810 triphop, electronic, downtempoSail AWOLNATION 18728 electronic, rock, indie rockThe Pretender Foo Fighters 18636 rock, alternative rock, grungeCosmic Love Florence + the Machine 18486 indie pop, rock, pop2 There Is a Light That Never Goes Out The Smiths 7479 new wave, rock, brit-popMr. Brightside The Killers 7128 rock, indie rock, alternative rockLittle Lion Man Mumford & Sons 6979 folk, indie folk, banjoR U Mine? Arctic Monkeys 6408 indie rock, rock, alternative rockI Bet You Look Good on the Dancefloor Arctic Monkeys 6302 indie rock, rock, alternative rockI Miss You blink-182 6295 rock, punk, pop-punkTeardrop Massive Attack 6187 triphop, electronic, downtempoThe Cave Mumford & Sons 6150 folk, indie folk, banjoVCR The xx 6147 electronic, indie rock, indie popHarder Better Faster Stronger Daft Punk 6083 electronic, house, electronica3 It Ain’t Cool To Be Crazy About You George Strait 19048 country, traditional country,Electric Feel MGMT 18108 electronic, electronica, indie popLittle Lion Man Mumford & Sons 17089 folk, indie folk, banjoTime to Pretend MGMT 16802 electronic, indietronica, electronicaFlume Bon Iver 16032 folk, singer/songwriter, indie folkIn the Aeroplane Over the Sea Neutral Milk Hotel 15753 indie rock, folk, lofiMidnight City M83 15635 electronic, electro-pop, electro1901 Phoenix 15591 indie pop, electronic, indie rockSuch Great Heights The Postal Service 15481 electronic, indie pop, electronicaThe Cave Mumford & Sons 15412 folk, indie folk, banjo4 Mephisto Dead Can Dance 2468 ambient, medieval, folk3 Libras A Perfect Circle 1284 alternative rock, progressive rock, rockAriane Nova 1238 –World’s End Hatsune Miku & Megurine Luka 1228 –Mr. Brightside The Killers 1109 rock, indie rock, alternative rockLas Fuerzas DÃńnver 1080 –Jeremy Pearl Jam 1069 grunge, rock, alternative rockReckoner Radiohead 1064 alternative rock, rock, experimentalThem Bones Alice in Chains 1057 grunge, rock, alternative rockNude Radiohead 1050 alternative rock, rock, electronic5 HÃďaden Two Robert Fripp 11616 –The Smile Phase 7898 alternative rock, progressive rock, art rockIbidem Phase 7858 alternative rock, art rock, rockPerdition Phase 7752 rock, psychedelic rock, progressive rockTranscendence Phase 7690 psychedelic rock, rock, alternative rockHypoxia Phase 7614 psychedelic rock, rock, alternative rockStatic Phase 6988 rock, progressive rock, space rockA Void Phase 6913 rock, alternative rock, indie rockStatic (Live) Phase 6877 progressive rock, psychedelic rock, rockEvening On My Dark Hillside Phase 6793 psychedelic rock, rock, alternative rock6 If I Could Sophie Zelmani 13420 singer/songwriter, pop, folkI Can’t Change [New Song] Sophie Zelmani 13409 –Without God Katatonia 8024 doom metal, metal, death metalDay Katatonia 7947 doom metal, metal, progressive metalLady Of The Summer Night Omega 6787 rockSorrow Pink Floyd 6485 progressive rock, rock, classic rockEquinoxe Part 5 Jean Michel Jarre 6457 ambient, electronic rock,Gammapolis Omega 5958 classic rock, progressive rock, space rockTo Know You Sophie Zelmani 4783 singer/songwriter, folk, popTo Know You (Alt. Version) Sophie Zelmani 4641 –7 Set Fire to the Rain Adele 17247 soul, pop, singer/songwriterFluorescent Adolescent Arctic Monkeys 13007 indie rock, rock, alternative rockParade Garbage 11770 rock, alternative rock, popNational Anthem Lana Del Rey 11602 indie pop, pop, triphopSkyscraper Demi Lovato 11451 pop, pop-rock, disneyCome & Get It Selena Gomez 11387 pop, electro-pop, dubstepPumped Up Kicks Foster the People 11171 indie pop, pop, indie rockDark Paradise Lana Del Rey 11056 pop, indie pop, chamber-popHeart Attack Demi Lovato 10606 pop, electro-pop, pop-rockYou Only Live Once The Strokes 10501 indie rock, rock, alternative rock8 Another Bottle Down Asking Alexandria 19779 post-hardcore, metal-core, screamoOnly You Savage 17657 disco, pop, new wave...Meltdown Enter Shikari 16320 post-hardcore, trance-core, electronicWhat You Want Evanescence 12345 rock, alternative metal, gothic rockGandhi Mate Gandhi Enter Shikari 12273 post-hardcore, electronic, dubstepDexter Ricardo Villalobos 11889 minimal, minimal techno, electronicParadise Circus Massive Attack 9922 triphop, electronic, downtempoTeardrop Massive Attack 9891 triphop, electronic, downtempoKill Mercy Within Korn 9484 numetal, electronic, dubstepSeven Nation Army The White Stripes 9380 rock, alternative rock, indie rock

Table 2. The 10 most popular tracks per cluster. Playcount refers to the total number of listening events by the users in each cluster.

Model P@10 P@100 R@10 R@100 NDCG@10 NDCG@100