[PDF] Individualized Context-Aware Tensor Factorization for Online Games Predictions

Abstract

Individual behavior and decisions are substantially influenced by their contexts, such as location, environment, and time. Changes along these dimensions can be readily observed in Multiplayer Online Battle Arena games (MOBA), where players face different in-game settings for each match and are subject to frequent game patches. Existing methods utilizing contextual information generalize the effect of a context over the entire population, but contextual information tailored to each individual can be more effective. To achieve this, we present the Neural Individualized Context-aware Embeddings (NICE) model for predicting user performance and game outcomes. Our proposed method identifies individual behavioral differences in different contexts by learning latent representations of users and contexts through non-negative tensor factorization. Using a dataset from the MOBA game League of Legends, we demonstrate that our model substantially improves the prediction of winning outcome, individual user performance, and user engagement.

Full PDF

22020 International Conference on Data Mining Workshops (ICDMW)

Individualized Context-Aware Tensor Factorizationfor Online Games Predictions

Julie Jiang ∗† , Kristina Lerman ∗† , Emilio Ferrara ∗†‡∗ Information Sciences Institute, University of Southern California, Marina del Rey, CA, 90292 † Department of Computer Science, University of Southern California, Los Angeles, CA, 90089 ‡ Annenberg School of Communication, University of Southern California, Los Angeles, CA, 90089

Abstract —Individual behavior and decisions are substantiallyinﬂuenced by their contexts, such as location, environment, andtime. Changes along these dimensions can be readily observed inMultiplayer Online Battle Arena games (MOBA), where playersface different in-game settings for each match and are subjectto frequent game patches. Existing methods utilizing contextualinformation generalize the effect of a context over the entirepopulation, but contextual information tailored to each individualcan be more effective. To achieve this, we present the NeuralIndividualized Context-aware Embeddings (NICE) model forpredicting user performance and game outcomes. Our proposedmethod identiﬁes individual behavioral differences in differentcontexts by learning latent representations of users and contextsthrough non-negative tensor factorization. Using a dataset fromthe MOBA game

League of Legends , we demonstrate that ourmodel substantially improves the prediction of winning outcome,individual user performance, and user engagement.

I. I

NTRODUCTION

Understanding human behavior through data is a hot areaof research in machine learning. Most data-driven methodsgeneralize known behavioral patterns to unknown ones, typ-ically framing the problem as an unsupervised or supervisedprediction task. However, the heterogeneity of human behaviorand contextual factors complicates the data formulation. Inthis paper, we consider the multiplayer online battle arena(MOBA) game League of Legends (LoL), where the multi-tudes of human interaction and decision-making make it amirror image of society in real life. The growing interest inonline games and the wealth of available gameplay data haveraised the possibility of modeling player performance througha data-driven approach. An LoL game is characterized as onestandalone match, wherein two teams of players compete bytrying to destroy the opposing team’s base ﬁrst. At the startof each match, each player assumes a role by controlling achampion as part of their attack or defense strategy, which inturn is largely dependent on the capabilities of their chosenchampion.Despite constraints in the gameplay environment, predictingthe outcomes of online games is challenging due to variabilityin player skills and the changing game contexts. Individualplayers differ in their skill level, their mastery of a speciﬁcchampion style, their game style preference, and their adapt-ability to different roles and situations. Even the hardware theyplay the game or the time lag due to poor internet connection and the geographical region could be partly responsible fortheir performances. In addition, game context changes globallyas a function of the game version, both at the team level,based on the queue or tournament types they play in, and atthe champion level, based on the upgrades or downgrades ofa champion’s skills and abilities.To address these challenges, we develop NICE (Neural Indi-vidualized Context-Aware Embeddings) to enable user-contextmodeling that is applicable to many user behavior datasets.The underlying technique we employ is tensor factorization,which has emerged as a powerful tool for unsupervised,multi-modal analysis in high dimensional data [1]. Beinga generalization of matrix factorization, tensor factorizationenables n-way analyses of multi-linear data that better capturestheir structural representations. The main challenge limitingapplication-speciﬁc tensor factorization surrounds constructingvalid tensors from high-order data. This is especially true forthe LoL dataset given the large number of users, the varyingnumbers of matches played by each user, and the varyingtime points at which each user chooses to play. In this paper,we describe a generalized approach to constructing and usingtensor factorization methods. In particular, we demonstrate thatNICE works well in the case of multiple types of contexts,such as contexts that apply globally to all users and contextsthat apply separately to each individual. By considering thepopulation as a whole in our tensor, the learned latent factorsserve to elucidate meaningful behavioral patterns speciﬁc toeach individual. Finally, we show that the latent factors aresuitable for various downstream applications via a deep neuralnetwork decoder.A unique advantage of NICE is that it easily combinesboth high predictive power and high interpretability in acontextually aware setting. We evaluate our approach bypredicting match outcomes and player performance in tens ofthousands of LoL players and nearly half a million matches.We demonstrate how our model can be effectively utilizedin a wide range of applications, from anticipating outcomesto estimating performance to predicting user engagement, andit systematically outperforms the baseline models. We alsoprovide an in-depth explanation of the latent factors we reveal,which offers insights into the heterogeneity of users and theeffects of various contextual settings. a r X i v : . [ c s . A I] F e b ig. 1. A line plot with error bars of the average number of kills , deaths , assists , and KDA ,attained by champions in each of the seven champion types acrossgame versions. The error bars show the standard error of the mean.

II. R

ELATED W ORK

Contextual awareness is important in many machine learn-ing applications. The ability to understand the context andbehave accordingly is as useful and crucial as model in-terpretability and generalizability, two topics that attractmuch attention. Although there exists research on promotingcontextual-awareness in machine learning models, a primarylimitation of current methods is their inability to generalizeto multiple and categorical context variables, of which thereare many in the real world. One direct approach is to modelone type of context, e.g., time, as an additional dimension in acollaborative ﬁltering based method [2]. In another example,[3] developed a localized context-aware method by trainingseparate models for each context. However, these methods donot naturally generalize to cases of multiple contextual factors.We approach contextualization using tensor factorization,for which there exists a broad body of research focusing onits analytical potentials and applications. The interpretabilityof factorization methods has enabled its adoption to under-stand human behavior [4]–[6] and analyze complex temporalactivities or signals [7], [8]. We also draw inspiration fromtensor factorization applications such as e-mail topic tracking[9], and patient clustering [10]. Pertaining to game-relatedstudies, [4] mined user behavioral patterns in LoL using tensorfactorization by measuring observable in-game performances.The majority of MOBA game research has been focusedon predicting factors inﬂuencing the outcome. Some tacklesthe problem by ﬁnding the most optimal team composition tomaximize the chance of winning. For example, [11] exploresbest teammate combinations and [12], [13] examined the mostadvantageous hero (ie. champions) lineups in

Dota 2 . [14]used neural networks to extract player behavioral patternsto predict match outcomes. These methods do not considerat the same time the individual variability of users and thecharacteristics of champions they select, therefore lackingcontextual awareness. NICE stands out among these methodsas a tool to not only predict outcomes but also to understandthe underlying human behavior in an unsupervised manner. III. T HE L EAGUE OF L EGENDS D ATASET

A. Data Collection and Preprocessing

We use an LoL dataset collected from April 2014 toNovember 2016 [15]. We preprocess the data to select onlyplayers who have a minimum of 15 matches within the timeframe. First, this avoids statistical bias due to users who quitearly on. Second, our method requires sufﬁcient data pointsfor each user so to obtain a meaningful proﬁle of the user’splaying habits. The ﬁnal dataset we use consists of 18,713players in 436,805 matches, totaling about 1 million uniqueplayer-match pairs.

B. Feature Selection

The LoL dataset contains match metadata and individualplayer data. Match metadata includes the queue type, mapID, version number, season number as well as the date, time,duration, and the team-based outcome of the match (win orlose). There are also in-game performance statistics for eachplayer, such as the number of kills , deaths , and assists . Thedataset also records each players’ choice of the champion,the lane of the attack, and the role in the team, which wecollectively deem as their individual contextual factors. Next,we brieﬂy describe the meaning of the features we will beusing.The queue type determines how the teams were formed.About half of the matches were solo-queue matches, meaningthe teams comprised of solo players who are likely strangerswith each other, as opposed queues where teams were partiallyor fully formed by the players themselves. About 60% of thematches were draft queues, in which players can restrict theavailable champions the opposing team can pick for a morecompetitive game. Finally, almost all of the matches wereranked queues. The outcome of a ranked queue match goestowards the ﬁnal ranking of a player at the end of a playingseason, which is a yearlong process. In some game maps,players can play in 3v3 mode, but the 5v5 maps are morepopular.At the start of each match, players determine how they con-tribute to the overall layout of the team. This is a combinationof choosing the lane on the map (e.g., top or bottom lane)nd the role they play as part of the team (e.g., support orcarry). Many of these choices conventionally depend on thechampion they select: experienced players informally agreeon the best lane or role combination that pairs with eachchampion, also known as the game’s meta or metagame [16].The 133 champions available during this time period arebroadly categorized as one of Controllers (17 champions),

Fighters (27),

Mages (23),

Marksmen (21),

Slayers (19),

Tanks (20) and

Unique Playstyles (6). Since the dataset offers onlythe champion name, we match champions with champion typesaccording to the ofﬁcial list released in 2016.Finally, the in-game performance measures include thenumber of kills , deaths , assists , gold earned , gold spent and champion level for each player in each match. Champion levelis a score that a player accumulates throughout a match, withhigher levels unlocking rewards and enhancing abilities. Thechampion level logged in the dataset is the ﬁnal level attainedat the end of the match. We additionally compute the kills-deaths-assist (KDA) ratio, deﬁned as kills + assists / ( deaths +1) , as an overall indicator of user performance.Like most MOBA games, LoL game developers closelymonitor the game and regularly release patches about everytwo weeks. One of the main reasons for patches is to alter thedynamics of the game to ensure fair play. No single gameplaystrategy or champion choice should dominate. Patches can“nerf” certain champions by decreasing the power level oftheir abilities and skills, or “buff” them through enhancements.Another reason for patches is to shake up the game inpreparation for new seasons. LoL seasons commonly last fromJanuary to November of each year and are followed by a shortpre-season period. Pre-seasons are typically when the largestoverhauls to maps, champions, and other game dynamics inthe game are introduced. To incorporate game changes asfeatures, we include the patch number as global contextualchange points. There were 62 patches within the time frameof our dataset over from season 2014 to season 2016. C. Contextual Impact in LoL

Many in-game performance statistics are strongly related tothe types of champions selected by a player in a match. Fig. 1illustrates the performance measure trends for the seven typesof champions. The differences in trends among championtypes align with our intuition based on the characteristics bydesign: some champions are designed to be equipped withstrong attacks, whereas others are mostly set up for positionsof assist. For example,

Slayer champions, consistently rankedthe highest in the number of kills, are known to have targeted,high bursts of damage. In contrast,

Controller championsoutput the least damage in terms of kills but outputs thehighest level of assists, which allows them to assist teammatesweakening their enemies and strengthening their teammates.Champion-based contextual information does not, however,directly impact winning outcomes. The concept of winningor losing is applied to the whole team, and teams customarilyconsist of a range of different champions and champion types.

Fig. 2. (a) Histogram of the user entropy of champion type distributions. ThePerfectly Balanced bar indicates the entropy if a user played all seven typesof champions equally.(b) Histogram of the number of champions played peruser for generalists and specialists.

Players tend to specialize in a few champions, and eachchampion they specialize in is usually only played in aspeciﬁc role. While versatility in game-play can be a desirableattribute, many players choose instead to focus on practicingand polishing their skills on select champions. To conﬁrm this,we measure the entropy of champion type distribution for eachuser. Suppose P i is the proportion of matches user x played aschampion type i , then the champion type distribution entropy H ( x ) of user x is: H ( x ) = − (cid:88) i P i log P i . (1)Fig. 2(a) displays a left-skewed distribution of entropyvalues. In particular, no user plays perfectly equally amongall seven champion types, in which case the entropy wouldbe maximal, i.e., equal to 1.95. We distinguish between generalists – who spread their usage more equally amongchampion types – and specialists – who concentrate on fewerchampion types – at the top and bottom decile marks ofentropy values, respectively. Most generalists spend more than60% of the time spread over 3 or more champion types, whilespecialists focus almost exclusively (over 50% of the time) onone type of champion. In terms of unique champions, Fig. 2(b)shows that generalists are observed to have at least played 8different champions at least once, with many exploring almostall 133 available champions. In contrast, most specialists playbetween 1 to 20 different champions. This reafﬁrms our beliefthat generalists and specialists exhibit fundamental differencesin their gameplay preferences, supporting our hypothesis thata model accounting for the differences in user and distinctionsamong champions can lead to more accurate predictions. D. Prediction Tasks

Using the LoL dataset, we perform a range of predictiontasks including predicting the winning outcome, kills , deaths , ssists , KDA and user engagement. User engagement is mea-sured by whether a match was the last match a player playedin a continuous gaming session (i.e., an end-of-session match),where a session is deﬁned as a series of consecutive matchesdivided by break intervals shorter than 15 minutes. Priorstudies show that players’ performance deteriorates throughouta MOBA game session [15] due to cognitive depletion. Thus,predicting the end of a session is helpful for understandingshort term and long term engagement of the game.Depending on the prediction task, we exclude certain fea-tures from the training data due to dependency issues. For thetarget variables kills , deaths , and assists , KDA is excluded.For the target variable KDA , kills , deaths , and assists areexcluded. IV. M ETHODS

A. User-Context Tensor

We represent the data as a user-context tensor, which wouldbe used in tensor factorization to mine context-dependentbehavior patterns. Speciﬁcally, we consider two types ofcontexts:1) Global contexts: contexts that are applied to everyone;2) Individual contexts: contexts that are speciﬁc to eachindividual, either voluntarily or not.The resulting tensor is X is constructed as the user × global contexts × individual contexts tensor, where users , global contexts , and individual contexts are each representedby unique identiﬁers. Each element x ijk ∈ X is the meannumber of times user user i experienced global context j andindividual context k . In our case, the global context is naturallythe game version organized chronologically, which applies toevery user simultaneously. Individual contexts are in-gamecontext settings speciﬁc to each player, which we consider tobe the champions. Thus, element x ijk ∈ X is the proportionof matches user i played as champion k in version j . Sincefeature dependencies should be avoided when using tensorfactorization, we did not include context variables championtype, queue type, role, and lane in the tensor since they arestrongly correlated with the choice of champions. Thoughour dataset contains only categorical context variables, ourtensor construction method can be easily adapted to continuouscontext variables by taking the mean value instead of thenormalized counts. Since not every user played the gameduring the time span of every version, the tensor is very sparse.Depending on the test ratio, only 20-30% of the tensor ispopulated. B. Tensor Factorization

We decompose our data represented in tensor form usingNon-Negative Tensor Factorization (NTF), a scalable algo-rithm based on PARAFAC/CANDECOMP, to obtain latentfactors of X [1]. These latent factors, or embeddings, arevector representations of the data in reduced dimensionality.By gleaning information from all users in various contextualsettings, we can learn how different context settings impact Fig. 3. Illustration of the user-context tensor of the LoL dataset and itsfactorized embeddings. each one individually. Fig. 3 shows an illustration of the user-context tensor when applied to the LoL dataset.NTF works by approximating the tensor with a sum of rank-one tensors, subject to non-negativity constraints: X ≈ R (cid:88) r =1 u r ◦ t r ◦ f r (2)where R is the chosen number of components and ◦ denotesouter product. This means that an element x ijk can be ex-pressed as x ijk ≈ R (cid:88) r =1 u ir t jr f kr (3)for all i ∈ , ..., I , j ∈ , .., J , and k ∈ , ..., K . Wealternatively denote the latent factors as matrices of users U ∈ R I × R , versions T ∈ R J × R and champions F ∈ R K × R ,where I is the number of users, J is the number of versions,and K is the number of champions. The tensor can be thenconcisely written as X = [[ U , T , F ]] using the Kruskaloperator [17]. The r -th column of the matrices U , T and F corresponds to the column vectors u r , t r and f r , which wealso call the component vectors . The rows of each matrixcorrespond to a single user, version or champion depending onthe factor matrix, which we also call the embedding of a user,version or champion. The minimization problem in a typicalCP decomposition problem is given by min (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) x ijk − R (cid:88) m =1 R (cid:88) n =1 R (cid:88) l =1 u im t jn f kl (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F s.t. u im , t jn , f kl ≥ (4)where || · || F is the Frobenius norm. Due to the considerablesparseness in our tensor, performance will sharply degradeusing standard practices for sparse data such as imputation.Instead, we use a masked version of the minimization problem: min (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) w ijk (cid:18) x ijk − R (cid:88) m =1 R (cid:88) n =1 R (cid:88) l =1 u im t jn f kl (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F (5)where w ijk is 0 if x ijk is missing and otherwise 1. for all i ∈ , ..., I , j ∈ , .., J , and k ∈ , ..., K . The problem isptimized using a nonlinear bound contrained optimizationalgorithm [18]. For our work, we use the implementationof weighted CP decomposition for incomplete data publiclyavailable in the Tensor Toolbox for MATLAB [19].NTF depends heavily on the choice of the number ofcomponents. Too many components can lead to overﬁtting yettoo few can lead to underﬁtting of the data In many tensorfactorization problems, tools such as the Core Consistencydiagnostic test (CORCONDIA) [20] or the ADD-ONE-UPmethod [21] can be used to estimate an appropriate numberof components. However, such methods will not work well inour case due to the large amount of missing data. Instead, weemploy a cross-validation approach to select the most optimalnumber of components from the predictive results. We detailour experimental setup in Section V-A. C. Deep Neural Network Decoder

With the latent representation of users, global contexts, andindividual features, we make predictions using a deep neuralnetwork decoder. At this stage, a training instance is a uniqueuser-match pair.Let x fi denote the vector of individual context (championtypes) i . For simplicity, we will drop the subscript i when-ever it is otherwise clear. We integrate x f with the learnedindividual context embeddings F as follows: F (cid:48) i = x f (cid:12) f with F (cid:48) jk = x fj F jk , (6)where (cid:12) is the Hadamard, or element-wise, product. We canunderstand F (cid:48) to be the individual context matrix for a user-match instance. F (cid:48) is reduced to a 1-dimensional vector f ∈ R R by summing up the values at every component r . Since eachuser-match instance only has one champion type, x fi is a binaryvector with a 1 for the selected champion and 0 anywhere else.The resulting individual context matrix F (cid:48) has all but one rowof zeros for our speciﬁc problem. However, this frameworkworks for any format of individual contexts.To incorporate f i with other features and embeddings, wecreate a concatenated vector h i = [ u i , f i , t i , x mi ] , where u i isthe user embedding, t i is version embedding, and x mi is theset of all other features in training instance i . Other featuresrefer to the season, queue type, champion type, role, lane, mapID, match duration, and the timestamp of the match. Alsopotentially included in x mi are kills , deaths , assists , and KDA,provided they are not the target variable or not afﬁliated withthe target variable. All continuous features are normalized andall categorical features are one-hot encoded. h i is then sent to8 fully connected layers with hidden sizes ranging from 258to 2 in decreasing powers of 2. Every layer is followed bya Leaky ReLU activation function and optionally a Dropout[22] layer.The output layer of the neural network produces a singleton,applied with either a ReLU (for regression) or sigmoid (forbinary classiﬁcation) activation function. Depending on thetask, we minimize either the mean squared error (regression) orthe binary cross-entropy (binary classiﬁcation) with the Adamoptimizer [23]. All weights are regularized with a beta of × − . We train the data on a batch size of 2048 untilconvergence using a learning rate of × − .V. R ESULTS

A. Evaluation

A random sample of 20% of the data is held-out as the testset. This split is stratiﬁed by user ID, as NICE must learn userembeddings for every user. The remainder of the 80% of thedata is used for training and hyperparameter tuning.Baseline comparisons selected are Random Forest [24] andXGBoost [25] for their adaptability to both regression andclassiﬁcation tasks and also for their ease of scaling up tolarge input data. Both baseline models are grid searched forthe most optimal hyperparameter using 5-fold cross-validationindependently for each target variable. Additionally, to showthe effectiveness of using embeddings as latent features, weuse the same DNN decoder built for NICE as another baselinemodel by not providing the DNN the latent embeddingsas feature inputs. Instead of latent embeddings, all baselinemodels receive user IDs and champion IDs as one-hot encodedvectors and the version number as a discrete numeric feature.The rest of the feature inputs remain the same.Binary classiﬁcation problems are evaluated using AUC andthe regression problems are evaluated using RMSE. To facil-itate the comparison of results for regression target variablesthat span different ranges, we also compute the NormalizedNRMSE:

NRMSE = RMSE Y max − Y min = (cid:113) n (cid:80) ni =1 (ˆ y i − ˆ y ) Y max − Y min , (7)NRMSE can be used to compare the results of various regres-sion tasks on the same scale.The optimal number of components used in the tensorfactorization step is the most crucial hyperparameter for theNICE models. We explore the predictive performance ofvarious component numbers from 1 to 10. We determinedthe most optimal number to be 6, the smallest number thatproduces the best result for all target variables without therisk of overﬁtting.Table I reports our experimental results for each targetvariable. Examining the results, we observe that almost allbaselines perform similarly, with XGBoost and DNN being thebest performing baseline models. Importantly, NICE deliversimprovement across all prediction tasks. The prediction ofwinning outcome appears to be an easy task, with evenbaseline models achieving AUCs as high as .948. For thistask, NICE outperforms the baselines with an improvement of.953 AUC. For the problem of predicting the end of a session,NICE signiﬁcantly increases the AUC by 17%, which may bebecause some players are more likely to play longer sessionsin one sitting than others, a behavioral pattern that is picked upby NICE. As for the regression tasks of predicting kills , deaths , assists , and KDA , NICE produces better results than baselinesfor all target variables. Of the four, kills and assists are morechallenging to predict due to their smaller baseline NRMSEs.

ABLE IP

REDICTION R ESULTS

Winning Outcome End of Session KDA Kills Deaths AssistsAUC AUC RMSE NRMSE RMSE NRMSE RMSE NRMSE RMSE NRMSERandom Forest 0.920 0.532 2.161 0.048 2.401 0.057 2.534 0.050 3.844 0.078XGBoost 0.948 0.579 1.982 0.044 2.216 0.052 2.361 0.046 3.563 0.073DNN 0.945 0.576 1.978 0.044 2.215 0.053 2.366 0.046 3.563 0.073NICE

We evaluate classiﬁcation problems with AUC (the higher the better) and regression problems with RMSE and NRMSE (the lower the better).The best results are indicated in bold.TABLE IIN

UMBER OF USERS AND CHAMPIONS WITH COMPONENT LABELS

Component Labels Unlabeled0 1 2 3 4 5Users 176 1148 2659 2495 2386 2476 7373Champions 15 16 18 31 20 38 4user or champion is assigned component label i if the i th com-ponent of its normalized embedding is the maximally activatedcomponent and at least 0.4. It is also for these two target variables we see the biggestimprovement using NICE, with a 9% gain in NRMSE fromthe next best model for kills and a 6% for assists . For targetvariables

KDA and

Deaths , NICE raises the baseline NRMSEby .001 ( ≈ kills and assists are more clearly segmented by champion typesthan deaths and KDA , explaining why NICE is especially goodat predict kills and deaths in the context of champion selection.These results support as evidence that the contextually-awaremodel NICE leads to more informed predictions.

B. Interpretation of Embeddings

A unique advantage of using tensor factorization is itsinterpretability. For this section, we apply NTF ( R = 6 ) on theentire dataset to visually analyze the meaning of the factorizedmatrices.We ﬁrst identify the component membership level of cham-pions types. This is done by ﬁrst averaging champion em-beddings to produce champion type embeddings. All entriesare then normalized by component so that each componentsums up to 1. We then rank the square of the entries indecreasing order and zero out entries that are beyond 95% ofthe norm of the component. From 4(a) we see that, on average,each component identiﬁes with 2 to 3 champion types, with Marksmen champions being strongly identiﬁed by a singlecomponent.To enhance the interpretability of the user factorizationmatrix U , we assign the component that each user mostprominently identiﬁes with as their label. We only assign labelsto those whose maximally activated component contributes atleast 40% to the total activation of the user’s embedding. Thenumber of users labeled as each component is listed in TableV-A. Component label 0 contains the fewest number of people,having less than 200, while all other component labels contain at least 1000 users. Less than half of the users have relativelyevenly activated embeddings and therefore was not assigned acomponent label. The labeled user embeddings are visualizedin Fig. 4(b) using TSNE [26]. Users sharing component labelsoccupy one cluster, indicating the embeddings of users whoshare the same maximally activated component are also overallsimilar to each other. In Fig. 4(c), we plot only the generalistsand the specialists that determined from Section III-C based ontheir champion type distributions. Specialists are concentratedin the various corners of the plot, whereas generalists, havingmore balanced activation, are scattered in the middle. Thisshows that the user embeddings indeed capture characteristicsof specialization, or lack thereof, of some users.We assess how components are related to in-game perfor-mance measures. This can be done by tracking users identiﬁedmost prominently by each component label or by trackingchampions using the same method (Table V-A). We show thetime series of performance measures across game versionsaggregated by user labels or champion labels (Fig. 5). Thelayered pattern across all performance measures accordingto component labels is visually similar to the performancetrends by champion types in Fig. 1: each component can beranked according to their average performance measure, andthis ranking is stable over time. Moreover, the time-series ofperformance measures partitioned by users and by championsexhibit parallel trajectories, a phenomenon that may be partlyexplained by the fact that users with particular componentlabels are frequent players of the champions of the same labels.Some components immediately stand out as having distinctcharacteristics in performances. For example, component 4(purple) is characterized by low kills , and high assists . FromFig. 4(a) we know that component 4 is primarily associatedwith the champion class Controllers and these characteristicsare indeed consistent with

Controller performance trends (Fig.1). Similarly, component 3 (red), associated with Mages andSlayers (Fig. 4(a)), is characterized by high kills , high deaths and low assists , a pattern in line with champion types Magesand Slayers from Fig. 1.Finally, the interpretation of the temporal factor matrix(Fig. 6(a)) offers a few additional insights particularly forcomponent 0 and its relation to the year 2016. While thetemporal components 2 through 5 remain fairly stable overtime, components 0 and 1 demonstrate an alternating patternintersecting at game versions indexed 36-40, which happens ig. 4. (a) Component activation of champion embeddings aggregated by their champion types. For each row of the seven champion types, we sort entriesin descending order and then mask out any entries that are greater than 0.95 of the cumulative sum. (b) TSNE visualization of the user embeddings, whereusers assigned component labels are color-coded accordingly, and users without component labels are shown in white. (c) TSNE visualization of specialistsand generalists users.Fig. 5. Time series of in-game performance measures for data points for which the users (top row) or champions (bottom row) are assigned a maximallyactivated component label. Component 0 is omitted in the user label aggregation due to the small number of users labeled as component 0, leading to highﬂuctuations in performance trends that distract visibility. to be pre-season 2016. At this time, the champion pick ratesof champions, which is the number of matches a championwas picked to play out of the number of total matches,exhibit similar patterns. Indeed, the pick rates and the temporalactivation correlate with a Pearson’s correlation coefﬁcient of0.88 and 0.83 for components 0 and 1, respectively, suggestingthat component vectors 0 and 1 correspond to two distinctgroups of champions. Component 0 tracks champions thatwere more popular before pre-season 2016 and component1 tracks champions that only grew in popularity starting frompre-season 2016.Fig. 6(c)-(d) also illustrates a relationship between compo-nent 0 and user engagement. Beginning from game version 40(the start of season 2016), the engagement from this group ofusers increased drastically and is higher than the average en-gagement of other users. Fig 6(d) depicts a Raincloud plot [27]of the number of days online recorded for each user group. Instark contrast to more than 600 average active days from users in other components, component 0 comprises a comparativelymuch more novice group of users, averaging less than 200active days, suggesting that component 0 corresponds to agroup of novice players dedicated to playing LoL in season2016. VI. D

ISCUSSION

Using tensor factorization methods on user-context ten-sor, our model NICE considerably improves prediction overbaseline methods that are applicable and amenable to similartypes of problems. The advantage of NICE is its personalized,context-aware deep learning approach. By drawing informa-tion from all users and all contexts, NICE learns from thedataset holistically, rather than locally at each data point.Further, while we applied NICE on an LoL dataset, we alsodescribe how NICE can be ﬂexibly applied to any human-centered large datasets in which there are contextual variables.User-context can be formed in situations in which there are ig. 6. (a) 3-unit moving average of the temporal component activation. (b)Pick rates of champions under each component label. (c) User engagementmeasured in terms of the average number of matches per user in each gameversion for users under each component label. (d) A Raincloud plot [27] ofuser engagement measured in terms of the number of days users were online. (i) global contexts that affect the population and (ii) individualcontexts speciﬁc to each user. The same principals can be evenmore easily applied when there are only individual-speciﬁccontexts using matrix factorization.In addition to the contextually informed predictions, NICEis highly interpretable. The factor matrices induced by the NTFprocedure opens the opportunity for many human behavioranalyses, such as examining the effects of contextual changesor treatments on users. The underlying heterogeneity of usersas observed by their chosen in-game contexts and their givenglobal contexts is encapsulated in the latent embeddings.As for future work, our effort will be two-pronged: froma methodological standpoint, we will work to extend ourmethodology to even richer, multi-way data; from what con-cerns empirical applications, we will investigate the use of ourframework to predict the performance or behavior of individ-uals in a variety of settings, from personalized health to socialmedia. In all these domains, user-context data is abundant.A framework like ours can be beneﬁcial for revealing hiddenpatterns in behavior, which can not only improve predictionbut also help explain and interpret the data.A

CKNOWLEDGMENTS

The authors are grateful to DARPA for support under AIEOpportunity No DARPA-PA-18-02-07. This project does notnecessarily reﬂect the position or policy of the Government;no ofﬁcial endorsement should be inferred. R

EFERENCES[1] T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,”

SIAM Rev. , vol. 51, no. 3, pp. 455–500, 2009.[2] C. Biancalana, F. Gasparetti, A. Micarelli, A. Miola, and G. Sansonetti,“Context-aware movie recommendation based on signal processingand machine learning,” in

Proc. 2nd Challenge Context-Aware MovieRecommendation . ACM, 2011, pp. 5–10.[3] N. Nascimento, P. Alencar, C. Lucena, and D. Cowan, “A context-aware machine learning-based approach,” in

Proc. 28th Annu. Int. Conf.Comput. Sci. Softw. Eng. , ser. CASCON ’18. USA: IBM Corp., 2018,p. 40–47.[4] A. Sapienza, A. Bessi, and E. Ferrara, “Non-negative tensor factorizationfor human behavioral pattern mining in online games,”

Information ,vol. 9, no. 3, p. 66, 2018.[5] H. Hosseinmardi, A. Ghasemian, S. Narayanan, K. Lerman, and E. Fer-rara, “Tensor embedding: A supervised framework for human behavioraldata mining and prediction,” arXiv:1808.10867 , 2018.[6] H. Hosseinmardi, H.-T. Kao, K. Lerman, and E. Ferrara, “Discoveringhidden structure in high dimensional human behavioral data via tensorfactorization,” arXiv:1905.08846 , 2019.[7] D. M. Dunlavy, T. G. Kolda, and E. Acar, “Temporal link predictionusing matrix and tensor factorizations,”

ACM Trans. Knowl. DiscoveryData (TKDD) , vol. 5, no. 2, p. 10, 2011.[8] A. Sapienza, A. Panisson, J. Wu, L. Gauvin, and C. Cattuto, “Detectinganomalies in time-varying networks using tensor decomposition,” in

Proc. IEEE ICDMW . IEEE, 2015, pp. 516–523.[9] B. W. Bader, M. W. Berry, and M. Browne, “Discussion tracking inEnron email using PARAFAC,” in

Survey of Text Mining II: Clustering,Classiﬁcation, and Retrieval . London: Springer London, 2008, pp.147–163.[10] M. Rufﬁni, R. Gavald`a, and E. Lim´on, “Clustering patients with tensordecomposition,” arXiv:1708.08994 , vol. 68, pp. 126–146, 18–19 Aug2017.[11] A. Sapienza, P. Goyal, and E. Ferrara, “Deep neural networks for optimalteam composition,”

Frontiers in Big Data , 2017.[13] A. Semenov, P. Romov, S. Korolev, D. Yashkov, and K. Neklyudov,“Performance of machine learning algorithms in predicting game out-come from drafts in dota 2,” in

Int. Conf. Anal. Images, Social Networksand Texts . Springer, 2016, pp. 26–37.[14] X. Lan, L. Duan, W. Chen, R. Qin, T. Nummenmaa, and J. Nummenmaa,“A player behavior model for predicting win-loss outcome in MOBAgames,” in

Adv. Data Mining Appl.

Springer International Publishing,2018, pp. 474–488.[15] A. Sapienza, Y. Zeng, A. Bessi, K. Lerman, and E. Ferrara, “Individualperformance in team-based online games,”

Royal Society Open Science ,vol. 5, no. 6, p. 180329, 2018.[16] S. Donaldson, “Mechanics and metagame: Exploring binary expertise inLeague of Legends,”

Games and Culture , vol. 12, no. 5, pp. 426–444,2017.[17] J. B. Kruskal, “Three-way arrays: rank and uniqueness of trilineardecompositions, with application to arithmetic complexity and statistics,”

Linear Algebra Appl. , vol. 18, no. 2, pp. 95–138, 1977.[18] R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu, “A limited memory algorithmfor bound constrained optimization,”

SIAM J. Scientiﬁc Comput. , vol. 16,no. 5, pp. 1190–1208, 1995.[19] E. Acar, D. M. Dunlavy, T. G. Kolda, and M. Mørup, “Scalable tensorfactorizations for incomplete data,”

Chemometrics and Intell. Lab. Syst. ,vol. 106, no. 1, pp. 41–56, 2011.[20] R. Bro and H. A. Kiers, “A new efﬁcient method for determining thenumber of components in PARAFAC models,”

J. Chemometrics , vol. 17,no. 5, pp. 274–286, 2003.[21] Z.-P. Chen, Z. Liu, Y.-Z. Cao, and R.-Q. Yu, “Efﬁcient way to estimatethe optimum number of factors for trilinear decomposition,”

AnalyticaChimica Acta , vol. 444, no. 2, pp. 295 – 307, 2001.[22] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhut-dinov, “Dropout: a simple way to prevent neural networks from overﬁt-ting,”

J. Machine Learn. Res. , vol. 15, no. 1, pp. 1929–1958, 2014.[23] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv:1412.6980 , 2014.24] A. Liaw and M. Wiener, “Classiﬁcation and regression by randomFor-est,”

R News , vol. 2, no. 3, pp. 18–22, 2002.[25] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,”in

Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining .ACM, 2016, pp. 785–794.[26] L. v. d. Maaten and G. Hinton, “Visualizing data using t-SNE,”

J.Machine Learn. Res. , vol. 9, no. Nov, pp. 2579–2605, 2008.[27] M. Allen, D. Poggiali, K. Whitaker, T. R. Marshall, and R. A. Kievit,“Raincloud plots: a multi-platform tool for robust data visualization,”