Automatic Player Identification in Dota 2
AAutomatic Player Identification in Dota 2
Sizhe Yuen
Maritime EngineeringUniversity of Southampton,[email protected]
John D. Thomson
School of Computer ScienceUniversity of St Andrews, [email protected]
Oliver Don [email protected]
ABSTRACT
Dota 2 is a popular, multiplayer online video game. Like manyonline games, players are mostly anonymous, being tied onlyto online accounts which can be readily obtained, sold andshared between multiple people. This makes it difficult totrack or ban players who exhibit unwanted behavior online. Inthis paper, we present a machine learning approach to identifyplayers based a ‘digital fingerprint’ of how they play the game,rather than by account. We use data on mouse movements, in-game statistics and game strategy extracted from match replaysand show that for best results, all of these are necessary. Weare able to obtain an accuracy of prediction of 95% for theproblem of predicting if two different matches were played bythe same player.
Author Keywords
Machine learning; online games; identification; digitalforensics;
INTRODUCTION
Dota 2 is a popular, multiplayer online video game. Likemany online games, players are mostly anonymous, being tiedonly to online accounts which can be readily obtained, soldand shared between multiple people. This paper presents anautomatic way of identifying a player by establishing a ‘digitalfingerprint’ based on how a player plays the game rather thanrelying on account information. It shows that incorporatinggame-centric information into the model is necessary for thebest performance, when compared to existing context freedigital forensic techniques which rely on mouse movements.Using a combination of game information and mouse move-ment analysis, we are able to identify if two different matcheswere played by the same player with an an accuracy of 95%.We believe this is an interesting result in itself, but it also hasreal world utility. The anonymity of online gaming and theease of obtaining new accounts often leads to unwanted behav-ior by a minority of players. This can manifest in a numberof ways: abusive communication, cheating and matchmakingmanipulation. Unchecked abusive communication often man-ifests itself in terms of sexist and homophobic abuse, whichleads to an unpleasant and hostile game environment. Match-making manipulation occurs when a player uses a differentaccount (usually purchased illegally) with a different skill rat-ing than the player has themselves. This leads to unbalancedgames. This presents a challenge for the developer of these games:simply banning accounts often does not work. As Dota 2 is afree to play multiplayer game, identifying players is especiallychallenging. A single person can create multiple accountsanonymously merely by providing different email addresses.As a result, unwanted displayed by some players towards oneanother [16] and the rise of illegitimate secondary marketsto sell accounts with higher ranks present a serious problem.Players who are banned from the game for bad behavior maycontinue to play using a fresh account. Additionally, withoutthe ability to verify and link a person to an account, it is easyfor players to cheat in amateur online tournaments by playingin the place of another player. A lot of manual effort and luckis required to determine if someone cheated this way. In arecent example, a player was caught cheating in the $30,000CSL Dota 2 tournament for college students. The cheater wasonly caught after officials were tipped off [19] due to obviouslystrange behavior. By using machine learning techniques toidentify the players based on their behavior, cases like thiswould be easily identifiable and solvable.Learning player identity by their behavior in order to takeaction against them is also more effective compared to takingaction based on account names, IP addresses or hardware IDbans. Players can just buy or create new accounts if theiroriginal one is banned; IP addresses are rarely static [24] dueto NAT gateways and banning a range of address may lead tobanning innocent players; Hardware ID bans may be effectivein the short term, but hardware can still change hands betweenpeople. A system which can recognise player behavior canbe more effective in the longer term, especially as we explorefrom a low-level and fine-grained view of the data rather thanhigh-level behavioral elements.
BACKGROUNDDota 2
Defence of the Ancients (Dota) 2 [2] is a popular free-to-play multiplayer online battle arena (MOBA) video gamedeveloped by Valve Corporation. In Dota 2, the player controlsa single hero as part of a team of five players competingagainst an opponent team. Each hero has their own uniqueabilities and characteristics, for example some heroes focus ondealing damage with spells and abilities, while other heroesuse passive abilities to augment their attacks. To win, theteam must destroy the opponent’s base structure, known asthe Ancient, located deep in the opponent controlled side ofthe map. The map is split into three lanes, with towers thatmust be destroyed in succession. Weaker, uncontrollable units1 a r X i v : . [ c s . A I] A ug igure 1: Representation of the map in Dota 2.called creeps periodically spawn from a team’s Ancient andtravel down the three lanes. Heroes gain experience whennearby creeps and heroes on the opposing team die. A herothen gains levels with experience, increasing the hero’s powersand abilities.Additionally, heroes gain gold throughout a match which canbe used to purchase items. Items provide powerful bonusesand abilities to a hero. Every hero has 6 slots in their inventoryin which they can place purchased items. Items can also beplaced in the stash or backpack, but do not provide any bonuseswhen placed there. Gold is gained from multiple sources, mostnotably by getting the killing blow on creeps, killing opponentheroes, destroying opponent buildings and passively over time.The map in Dota 2 is asymmetrical and split in half for thetwo teams: Radiant and Dire. Figure 1 shows a representationof what the map looks like. Playing on different sides leadsto slightly different strategies due to the asymmetry. Theexplanation of other more complicated mechanics in the game,such as the neutral jungle creeps and runes are omitted as theydo not form any part of the features or data extracted in thiswork. Related work
Starcraft 2 player identification
The problem of player identification based on play style andbehavior has been explored before in the game of Starcraft 2.Liu et al. [17] first looked at using machine learning algorithmsto identify a Starcraft 2 player from features extracted frommatch replays and followed up with further research [18]on predicting a player’s next actions based on their previousactions in the match.Their work has a similar goal to ours to predict players basedon behavioral data. However, our work differs in multipleways. Firstly, the gameplay for Starcraft 2 and Dota 2 aredifferent as the player controls a large number of units in theformer and only one in the latter. This means the features usedfor player prediction in Starcraft focused more heavily on thestrategy and build order of players rather than fine graineddata such as mouse movement. Secondly, Liu et al. focussed on the classification from a fixed pool of players. While wealso do the same, we go further and look at classifying pairsof matches belonging to the same player. This allows ourpredictions to be much more general and not need to be re-trained on every new pool of players in the dataset.
Dota 2
There has been a variety of previous work in data analysis onDota 2. Much of the growing research in Dota 2 comes fromthe wealth of data provided from the replay system, whichranges from the positions of each player’s mouse cursor to thehealth points of each creep entity.One area of exploration with Dota 2 data is the classificationof player roles. As Dota 2 is a team game, each player onthe team typically fulfils a certain role in a match, similar totraditional sports. Gao et al. [13] presented positive resultsin the identification of both the hero and role that a playerplayed using a mix of performance data and behavioral datathat involved ability and item usage. Eggert et al. [10] tooka further step in feature generation for role classification byconstructing complex attributes using low-level data from aparsed replay of a match. This included features such as playerpositional movement and damage done during teamfights.Data analysis in Dota 2 does not only involve analysis of in-game data using replays. Hodge et al. [14] used a mix ofpre-match draft data such as the history of a player’s perfor-mance on different heroes and in-game data of game statesat different sliding window intervals to predict the outcomeof unseen matches. Taking match outcome prediction further,Yang et al. [26] were able to predict match outcomes in real-time as the match progressed, to get the probabilities of eachteam winning at every minute of a match. For real-time data,they showed data from later stages of a match were much moreinformative, as early portions of a match were often similarto other matches. Their work was compared with work fromConley and Perry [7], whose focus was in creating an enginefor hero pick recommendations based on win predictions, andwith work from Kinkade and Lim [15], who also investigatedcombinations of in-game statistics and draft picks outcomeprediction. Drachen et al. [9] studied how team behavior var-ied as a function of player skill, specifically on the movementand positioning of heroes by players as spatial-temporal data.This contrasts with most other studies, which have focused ononly temporal data like the gold and XP per minute statistics.There is also existing research that uses qualitative data ratherthan quantitative data. Nuangjumnong and Mitomo [20] con-ducted a survey on players which showed correlation betweenthe leadership style of players and the role they played in theteam. Summerville et al. [23] studied the drafting phase of thegame to find common trends and predict the draft sequence.In particular, because the draft sequence was described as alist of words, they chose to use a one-hot encoding with acategorical cross-entropy loss to encode the hero names.
Other video games
Player behaviour in video games has often been modelledto gather meaningful data for further analysis. Pfau et al.[21] applied machine learning techniques to construct deep2layer behaviour models in the video game Lineage II. Thesemodels can be applied in online crime detection for cheatingand for autonomous testing of the game. Bakkes et al. [5]provided approaches to player behavioural modelling with thegoals of improving game AI and improving game development.Drachen et al. [8] developed in-depth behavioural models forplayers in the video game Destiny to identify distinct play-styles and profiles players can be clustered into.
Mouse dynamics
In research unrelated to Dota 2, many studies have investigatedthe area of using mouse dynamics to detect user identity. Thisis a behavioral biometric technology that analyses differentmouse movement attributes, such as the velocity or curva-ture of the mouse cursor. A significant portion of Dota 2’sgameplay revolves around the use of mouse clicks and move-ments, combined with occasional key presses. Bhatnagar et al.[6] compared the use of mouse and keyboard dynamics as abiometric technique and stated that keystroke dynamics havelower predictive accuracy, but mouse dynamics data tends tobe inconsistent and error prone due to the varied hardwareinput devices.Mouse movement data extraction can be done explicity byusing a predefined activity, or implicity by monitoring typicalactivity without a specified task [11]. For example, Gambaoand Fred [12] used a simple memory game to extract their data.This method allows for well-defined actions. Feher et al. [11]went further by categorising mouse movement features intodistinct hierarchies, which allowed for precise categorisationof various different types of mouse movement. Their resultswere compared with a histogram based method from Awadand Traore [4] of aggregating different types of mouse actionand obtained noteworthy improvement in the accuracy of theirmodels.
METHODOLOGY
In this paper, we investigated the use of three different feature-sets to determine who the player is. Each featureset exploresa different aspect of the game, to see which aspects have agreater effect in predicting players. The mouse movementfeatures are fine-grained biometric features that directly cap-ture player behavior based on how they move their mouse.The game statistics features are higher level results of playerbehavior. For example, an aggressive player may tend to-wards higher kill/death statistics compared to a more passiveplayer. Finally, the itemization features capture both strategyand gameplay behavior of players. Players typically havestrategies (or builds) that they use on the same hero which isreflected in the items they buy. The position of the items inthe inventory also indicate specific player behavior based ontheir personal keyboard bindings for the game.The features are used both individually and in combinationwith the other features on three different machine learningmodels (logistic regression, random forest classifier and multi-layer perceptron) to find the best model and feature combina-tion for predicting player behavior.
Data collection and processing
A combination of the OpenDota [3] API and Valve’s officialAPI was used to download replays. In particular, the OpenDotaAPI allowed many aspects of the replays to be controlled, suchas the game mode and hero id, leading to less random variables.The following list of parameters were used for our datasets: • Game mode : All pick • Team : Radiant • Date : Before 18 November 2018 (to avoid changes frompatch 7.20 affecting some of the existing work) • Hero id : Constant as the same hero was used for all thedatasetsMatch replays were parsed using clarity , an open source parserdeveloped by Martin Schrodt [22]. It allowed fine grained datato be extracted from each replay on each game tick. After thedata was extracted, more processing was done to encode thedata into useful features to create our dataset.
Mouse movement features
The mouse movement features extracted from the Dota 2match replays followed the concept by Feher et al [11] onforming a hierarchy of atomic actions such as left click ormouse move and complex actions made up of multiple atomicactions, such as mouse drag. As the data available from re-plays are less granular and do not contain specific data suchas mouse click down and mouse click up, the addition ofkeyboard presses were combined to form the complex mouseactions in this work. Two classes of mouse actions are defined: atomic and complex . Atomic mouse actions
The four atomic mouse actions are as follows:1. Mouse movement sequence2. Attack command3. Move command4. Spell cast commandA mouse movement sequence is defined as a sequence of posi-tions. Rather than using a fixed interval of time, a threshold τ is defined to end the sequence if no change in cursor positionhas occurred within the threshold time. This more naturallyrecords a sequence of mouse movements to not break up asequence in order to fit within a fixed time interval. A largerthreshold increases the average number of actions in the se-quence. A threshold of 500 milliseconds was used in Feheret al.’s [11] study. This was reduced to 300 milliseconds forcapturing Dota 2 mouse features as it was found a 500 mil-lisecond threshold created fewer and longer mouse movementsequences than expected.Each movement sequence consisted of three vectors of length n , where n is the number of game ticks: • t = { t i } ni = - The game tick • x = { x i } ni = - The x coordinate sampled on game tick t i • y = { y i } ni = - The y coordinate sampled on game tick t i eature Definition Angle of movement θ i = arctan ( δ y δ x ) + i ∑ j = δ θ j Curvature c = δ θδ s Rate of change of curvature ∆ c = δ c δ s Horizontal velocity V x = δ x δ t Vertical velocity V y = δ y δ t Velocity V = (cid:113) δ V x + δ V y Acceleration V (cid:48) = δ V δ t Jerk V (cid:48)(cid:48) = δ V (cid:48) δ t Angular velocity w = δ θ t δ t Table 1: Basic mouse movement features. n can vary between different movement sequences for longeror shorter sequences. These vectors are further processed togive the list of mouse movement features shown in table 1,based on Gambao and Fred’s [12] features.The other three atomic actions consist simply of the gametick, x and y coordinate. They represent in-game intent of themouse movement. For example if a player right-clicks on theterrain, it is a move action to move their character, but if aplayer right-clicks an enemy entity, it is an attack action. Complex mouse actions
The combination of a mouse movement sequence and one ofthe three command actions creates a complex mouse action.There are three complex mouse actions, each correspondingto their respective atomic command. Complex mouse actionsrepresent a mouse movement sequence leading up to a com-mand. Each of the three complex actions are recorded in thesame way, but record different kinds of data. For example, thecomplex move action will be recorded throughout a match,while the complex spell cast action will typically occur onlyduring teamfights.In addition to the features mentioned in table 1, two morefeatures are included for complex mouse actions: • t n - The number of game ticks between the last mouseposition of the movement sequence and mouse position ofthe command. • d - The distance travelled between the last mouse positionsof the movement sequence and mouse position of the com-mand Finally, because the feature vectors for a mouse movementsequence are of varying length, some statistical values must betaken. The minimum, maximum, mean and standard deviationare used leading to a total of 38 features for complex mouseaction. Game statistic features
In-game statistics are a useful way to determine player perfor-mance in Dota 2. Many community websites [1, 3] track thesestatistics to let players view the performances of their matchhistory. Many existing studies [14, 15, 25, 26] also use thesefeatures, especially for predicting performance of teams andplayers.The statistics extracted for this work are: • Kills • Assists • Deaths • Gold per minute • XP per minute • CS (creep score) perminute • Denies • Actions per minute • Move commands on tar-get per minute • Move commands on posi-tion per minute • Attack commands on tar-get per minute • Attack commands on po-sition per minute • Spell cast commands ontarget per minute • Spell cast commands onposition per minute • Spell cast commandswith no target per minute • Hold position commandsper minuteMost of the statistics are taken as per minute average ratherthan total, because they can vary greatly depending on thelength of the game. In general, the game statistics are strong in-dicators of performance as winning teams usually have highergold and XP numbers. They can also provide behavioral in-formation, for example aggressive players may often get morekills than more passive players, they may get more deaths asa result as well. Further, there are a few different methodsfor players to accomplish the same action, which can be re-vealed based on whether they use commands on targets or onpositions.
Itemization features
The items a player buys on their hero had been shown byEggert et al. and Gao et al. as a strong indicator of the heroselection and role of the player. By itself, itemization is likelynot enough to tell the difference between players, as multipleplayers on the same hero are very likely to buy the sameitems. However, items can be placed into different inventorypositions, which becomes a more personal choice by players.Moreover, many items in Dota 2 have active abilities. Theinventory positions of the item therefore can reflect a player’skeyboard bindings. It is not possible to extract a player’spersonal settings from a replay, so using item positions are a4igure 2: Representation of an inventory with items in Dota2. The letters next to the items are bound to that inventoryposition and indicate the key to press to activate that item.close approximate. Figure 2 shows how the items are placedin different slots with associated key bindings.The inventories are sampled once at the beginning of a matchand once at the end to prevent variance. Items sampled atthe beginning will not be affected by variables such as perfor-mance as the game has just started. Items sampled at the endare similar, as most items that players want will have alreadybeen purchased and are less affected by the flow of the match.The items and their positions are then encoded with multiplemethods: feature hashing, one-hot encoding, selective one-hotencoding and item difference.Selective one-hot encoding only used a small subset of itemsdeemed more useful for player prediction. Two subsets werechosen, starting items and boots. There are only a smallnumber of items a player can buy at the start of a match,which restricts the number of items from about 300 to 30when looking at only starting items. This reduces the largenumber of features that one-hot encoding generates ( ∼ × EXPERIMENTS
Two different experiments were done to explore the problemof predicting player identity in Dota 2 based on the featuresdiscussed earlier. A dataset comprising of 93 players with 356matches was created. There was a mix of number of matchesper player, with some players having 10 matches in the datasetwhile others only had 1 or 2 matches. The goal was to see ifwe could successfully identify one player out of the pool ofplayers in the dataset given a single match.The second experiment took a more generic approach. Ratherthan predict out of a static pool of players, the goal was toidentify if two matches belonged to the same player or not.
AttackModelAttackactions MoveModelMoveactions SpellModelSpellactions StatsModelGamestatistics ItemsModelItemizationMatch MLP Prediction
Figure 3: Classifying a single match with multiple differentfeatures.In this approach, there is no concept of individual players,only the binary choice of whether the two given matches wereplayed by the same person. A combination of pairs of matcheswere created from the original dataset, which gave 3071 com-binations where two matches belonged to the same player.The number of combinations where two matches belonged todifferent players was much larger, but was randomly filtereddown to 3071 to ensure no bias towards the negative samples.The features used in the second experiment differed slightlyfrom the first experiment because all matches have differentlengths, making it difficult to compare the features from twomatches without further modification. Further statistics arecalculated for each mouse movement feature over the entirematch. For example, the horizontal velocity of a mouse actioncreates four features: the mean, standard deviation, minimumand maximum horizontal velocity of each individual mouseaction. There are many drawbacks to this approach, mostnotably the loss of data from taking an average of averages,as the thousands of mouse actions are reduced to only a smallnumber of statistics. To alleviate this issue, matches are splitup into portions for each portion to have its own statistics. Thisalso allows each individual portion to be analysed. However,this method is not scalable, as more portions means morefeatures, leading to the curse of dimensionality.
Experimental design
The mouse movement, game statistic and itemization featureswere explored both separately and together to identify theirperformance in player prediction. It was assumed that inthe dataset, all replays that were downloaded from the sameaccount were matches by the same player.As the different features had different dimensions, each wereevaluated with separate models, with the results of each modelcombined using a simple multi-layer perceptron with 1 hiddenlayer. Figure 3 shows how matches split up into each featureand model separately before being combined. A multi-layerperceptron is used as the combination step so it can learnthe relative weights for each feature, rather than use them allequally in a voting scheme or set arbitrary weights to them.Further, this approach is modular and allows features to beadded or removed easily to test the performance of differentcombinations of features.5ogisticRegression RandomForest Multi-layerPerceptron0 . . . . A cc u r ac y r a t e Accuracy of mouse movement actions
Attack actions Move actions Cast actionsFigure 4: Accuracy of individual types of complex mouseaction. It can be seen that attack actions are the most usefulfor player prediction across all three models.LogisticRegression RandomForest Multi-layerPerceptron0 . . . . M e t r i c r a t e Matches with mouse movements features
Accuracy Precision RecallFigure 5: Results of classifying each match with all three typesof mouse actions combined.Accuracy, precision and recall metrics were gathered using a5-fold cross validation. Accuracy measures the overall per-formance of a model by computing the number of correctpredictions over the total number of predictions. Precision andrecall and complementary metrics that measure the relevanceof predictions, with precision taking into account false positiveresults and recall taking into account false negative results.precision = true positivestrue positives + false positivesrecall = true positivestrue positives + false negativesThree different models were chosen based on models thatprevious work found successful. The three models are logisticregression, random forest classifier and multi-layer perceptron. RESULTSPlayer identification (experiment 1)
Mouse movement features
First, the mouse movement features were evaluated individu-ally to give an indication as to which type of complex mouseaction is more informative for player prediction. Figure 4shows that mouse movements following an attack action were the most predictive, with movements following a spell cast be-ing the least predictive. It is important to note the distributionof complex mouse actions are not equal, with an average of20%, 78% and 2% for attack, move and cast actions respec-tively. This directly affects the number of data points availablefor training and testing.Taking this into account, complex attack actions make a strongcase for being the most effective at predicting player behavior,as they are less common, but more predictive than complexmove actions. Complex move actions seem to be too commonto be a strong indicator for player behavior. They are sent inthe order of a few thousand commands in every match, due tohow movement and positioning are fundamental parts of Dota2’s gameplay. This commonality can lead to difficulty in dis-tinguishing between different players, especially when playersare sending move commands multiple times a second, makingthe mouse movement sequence very short. Complex spellactions should follow the pattern for complex attack actions,since they are also rare and unique compared to move actions.However, it seems their rarity (2%) affects their usefulness.When looking at the mouse actions used together in figure5, we can see the differences between each machine learn-ing model clearly. The random forest classifier has perfectprecision but low recall, indicating a lack of false positivespredicted. From the test set, we found 0 false positives anda large proportion of false negative predictions. There arestill true positive predictions, so the model is not predictingonly negatives, but is highly biased towards it. This suggests aspecific player’s (the positive sample) behavior is difficult todifferentiate from another player in most cases, but when it isdetected it is very obviously from the specific player.The performance of logistic regression was quite noteworthyas it matches the neural network in all three performancemetrics. This is an improvement compared to using individualcomplex mouse actions. This also shows that the combiningnetwork was able to learn useful weights to apply to each ofthe three logistic regression models and such a mixture ofdifferent complex mouse action types are more useful thaneach mouse action alone.
Game statistic features
Using game statistic features gave a set of different resultscompared to using mouse action features, with the neural net-work performing poorly and the other two models achievingvery high accuracies. As shown in figure 6, the random forestclassifier had perfect results. To confirm this, we ran furthercross validation with varying values of k up to 12 for addi-tional trials. It was found that the accuracy only fell twice,at k = k = Itemization features
Each encoding method of itemization gave different perfor-mances, with the random forest classifier and multi-layer per-ceptron giving the same pattern for each encoding method, asshown in figure 7. Firstly, encoding with boots only data had6ogisticRegression RandomForest Multi-layerPerceptron0 . . . . M e t r i c r a t e Matches with game statistic features
Accuracy Precision RecallFigure 6: Results of classifying each match with game statis-tics.the lowest accuracy of all the encoding methods. This makessense as it uses strictly less items compared to other encod-ing methods. This represented a trade-off for computation offeatures and accuracy. For comparison, the one-hot encodingof only starting items gets 4%, 6% and 7% more accuracy foreach model respectively, but uses 186 features compared tothe 42 features of encoding only boots data.We also found a small improvement from using starting itemsonly compared to using all items with one-hot encoding. Re-call that the encoding of all items is sampled twice in a match,once at the beginning and once at the end. The close perfor-mance of the two encoding methods suggests a few things.Firstly, the massive increase in binary features does not heav-ily impact performance. The encoding of all items containsabout 9 times the number of features (281 items vs 31 items)but the accuracy deviates by less than 5%. Secondly, thoughthe starting items do not have the second sample of items atthe end of the match, it still predicts better in two of the threemachine learning models. This shows the items locations ofat the start of a match is more useful in player prediction thanat the end of a match. This makes sense from the game’sperspective, as items a player buys throughout a match is oftendependent on the strategy, composition and context of theirteam, the opposing team and how well they are doing, so thereis more variance in end of match items compared to the start ofmatches. Further, many matches end before a player finishesbuying all items they need, so item locations recorded at theend of a match could be temporary and different from othermatches by the same player, depending on the context of thematch.
Combined features
Combining the features together gave mixed results in mostcases. There were some improvements in specific cases, butthere were also cases where performance decreased.Logistic regression was able to predict well in general, withthe combination of game statistics and boots achieving 99%accuracy. There was also an interesting anomaly in combiningmouse movement, game statistics and one-hot encoding ofall items that had a drastic reduction in accuracy. This maybe due to the combination of one-hot encoding of items and LogisticRegression RandomForest Multi-layerPerceptron0 . . . . A cc u r ac y r a t e Accuracy of itemization features
Items hashed One-hot encodingStarting items Boots onlyFigure 7: Results of classifying each match with itemizationfeatures.mouse movement being a bad combination, as it was the onlyother combination that had lower than 90% accuracy.The random forest classifier showed reduced accuracy in mostcombinations using mouse movements compared to using thefeatures alone. For example, the various itemization encodingmethods had about 85-90% accuracy and game statistics hadclose to 100%. However, when any of these features are com-bined with the mouse movement data, the accuracy drops. Itcan also be seen in table 2 that the combination of itemizationand game statistics do well, so it is only the mouse movementfeatures that is an issue for this model. Recall that the preci-sion for using mouse movement features in the random forestclassifier was extremely high with low recall, which reducedits accuracy. The lack of accuracy even when combined withaccurate features such as game statistics shows that it is notenough to combine all features together, and bad features adda lot of noise to the random forest model.Finally, the multi-layer perceptron was able to combine fea-tures together better. Though it performed poorly using onlystatistics, the addition of mouse movements and itemizationfeatures were able to help boost its accuracy. Interestingly, thebetter individually performing item encoding methods werenot always better than others when combined with other fea-tures. For example, using mouse features with the hasheditems and mouse features with boots only gave similar ac-curacy, even though the hashed encoding was more accuratealone compared to the boots only feature.
Same player identification (experiment 2)
Mouse movement features
Figure 8 shows the performance of the three machine learningmethods. In general, logistic regression performs much worsefor this problem, regardless of how the match is split intotime-slices. This shows the non-linear nature of this problemcompared to the first one. Splitting the match into multipletime-slices doesn’t help or hinder the accuracy for logisticregression, which shows it is unlikely an issue of too manyfeatures affecting the performance, otherwise a greater fall inaccuracy would be expected from figure 9.7 odelsFeatures
LogisticRegression RandomForest Multi-layerPerceptronMouse + Stats 0.959 0.597 0.738Mouse + Items hashed 0.919 0.646 0.909Mouse + Items one-hot 0.888 0.606 0.939Mouse + Starting items 0.918 0.617 0.919Mouse + Boots only 0.911 0.857 0.908Stats + Items hashed 0.969 0.990 0.868Stats + Items one-hot 0.969 0.979 0.939Stats + Starting items 0.959 1.0 0.929Stats + Boots only 0.99 0.959 0.859Mouse + Stats + Itemshashed 0.949 0.636 0.808Mouse + Stats + Itemsone-hot 0.726 0.769 0.919Mouse + Stats + Startingitems 0.959 0.586 0.929Mouse + Stats + Bootsonly 0.959 0.615 0.928Table 2: Accuracy of each model on each feature combination.The accuracies highlighted in red are of particular interest fordiscussion.The overall accuracies of all three models were also lowercompared to the first problem, which shows the difficulty ofthis problem in comparison, when there is not a small andlimited pool of players.Moreover, using additional time-slices caused less accuratepredictions, which suggests the existence of noise when thefeatureset is repeated for each time-slice. Both the randomforest classifier and multi-layer perceptron lost accuracy asa match was sliced into more parts and used together. Thisshows that although slicing gives more fine-grained data, theadditional noise is detrimental to player prediction. We alsofound that each individual time-slice has little effect on theaccuracy of predictions, as figure 10 show little performancedifference between each individual time slice. This is rather un-expected, since the strategy and gameplay in different phasesLogisticRegression RandomForest Multi-layerPerceptron0 . . . M e t r i c r a t e Pairs of matches with mouse movement features
Accuracy Precision RecallFigure 8: Accuracy, precision and recall of using mouse move-ment features only when classifying pairs of matches. LogisticRegression RandomForest Multi-layerPerceptron0 . . . A cc u r ac y r a t e Accuracy of using multiple time-slices . . . . A cc u r ac y r a t e Accuracy of individual time-slices
Logistic Regression Random ForestMulti-layer PerceptronFigure 10: Accuracy for mouse movement features on individ-ual portions of a match.of a Dota 2 game change, so one would expect the mousemovement behavior of players to change as well.
Game statistic features
Figure 13 shows the correlation matrix between pairs ofmatches by the same player. We can see how each statis-tic feature of one match is correlated to the statistic featureof another match from the same player. For instance, the CS,denies, gold per minute, XP per minute and CS per minutestatistics are well correlated with each other, even across dif-ferent matches. It would be expected for these numbers tobe correlated in the same match, as creeps directly affect thegold and XP gained by heroes. However, seeing the statisticscontinue this trend across different matches suggest they arealso tied to specific player behavior.It should be noted that the correlation matrix is mirrored acrossthe diagonal. This is a result of the combination step, wherematches are combined into pairs of {match0, match1} and {match1, match0} and should not be taken as a meaningful re-sult. Due to the way the clarity parser runs through matcheschronologically, a bug in the library prevented us from split-8ogisticRegression RandomForest Multi-layerPerceptron0 . . . A cc u r ac y r a t e Accuracy with itemization features on pairs of matches
Hashed items One-hot Starting itemsBoots only Item differencesFigure 11: Accuracy for game itemization features when clas-sifying if two matches belong to the same player.ting up a match into time-slices like the mouse movementfeature for game statistics.Once again, the random forest classifier performs the bestwhen using the game statistic features, showing a strong cor-relation between game statistics and player prediction usingdecision tree based models. This seems to be due to decisionstree being able to determine ranges for statistics for playerprediction. For example if one statistic is always in a similarrange across multiple games, it is likely to be the same player.
Itemization features
Figure 11 shows the accuracy with different itemization encod-ings and features. For these features, using only boots was thebest indicator for random forest and multi-layer perceptron,while the item differences were best for the logistic regres-sion model. Once again, logistic regression did not have goodresults compared to the other models. The only exceptionis the item difference method of encoding. This is a bit ofan anomaly, as the trend suggests logistic regression is notsuitable for this classification task.The difference between the item difference encoding and allother features is that features are not duplicated for the pair ofmatches. For all other features, the features for both matchesare used together, to see if features across the two matches maybe correlated. The game statistic features shown earlier displaythis kind of correlation. However for the item difference, thepresence or absence of the same item on an inventory slotis encoded and so there is no duplication of features. Thiscontrast reduces the complexity of the feature, which allowedthe logistic regression model to perform comparably to theother two models where all other features could not.The one-hot encoding of all items performed similarly to theone-hot encoding of only starting items. This was surprisingbecause the one-hot encoding of all items contained 250 moreitems, which translates to 1500 more binary features. As wasthe case in the last experiment, the increased number of binaryfeatures had little effect on the performance of the models.The gap between the two methods is smaller than previously,which means just using starting items to differentiate betweentwo players is more difficult than identifying a single player. Startingitemsonly Mouse+ items Stats +items Mouse+ stats+ itemsMouse+ stats0 . . . . M e t r i c r a t e Combining features a random forest model
Accuracy Precision RecallFigure 12: Performance increases as different features arecombined with the starting item feature for the random forestclassifier.This suggests a more diverse pool of players leads to moresimilarity in starting items between any two players.
Combined features
Strong results were found for combining features together,unlike the first experiment. All three models were able to findimprovements from using the different features together. Forinstance, the logistic regression model could not accuratelypredict if two matches belonged to the same player for allfeatures except for the item difference feature. However, whencombining the item difference feature with the poor perform-ing mouse movement and game statistic features, we wereable to get up to 10% higher accuracy on certain combinations.For the multi-layer perceptron, we found precision generallyincreased with more featuresets at the cost of recall. This ledto a smaller increase in accuracy of 5-7%. The high precisionis more beneficial compared to high recall because less falsepositives means predictions for two matches belonging to thesame player are more unlikely to be wrong, even if we maymiss some predictions due to false negatives.We found the random forest classifier was able to achieve thehighest accuracies in this problem when multiple features arecombined together. Figure 12 shows the improvement wefound. The model was able to go from 81% correct predic-tions using only starting items to 91%-95% correct predictionsfrom a combinations of starting items and other features. Thehighest 95% accuracy came from a combination of startingitems, game statistics and mouse movement features.
FUTURE WORK
Fine grained information on mouse movement features
Although we have been successful in using mouse movementfeatures to predict players with good results, there are stillmany details that are unknown. For example, although we takea number of attributes for each mouse movement sequencerecorded, the context such as target entity or current hero prop-erties are not recorded. Including the context for each mousemovement would allow us to learn if certain actions undercertain circumstances are more indicative of player behavior.9igure 13: Correlation matrix for pairs by the same players
Multiple heroes from the same player
In this paper, we focused our research on the exploration andcombinations of features and models for player prediction sowe kept the hero used constant throughout the experiments.The same methodology is likely to work on different heroes,but it is not known how it would work in a dataset that con-tained multiple heroes. This is because the difference in play-style of different heroes would interfere with the differencein player behavior, so it would be interesting to see how thiswork could be extended to a multi-class classification of thiscase.
Deep Learning
Our work primarily used traditional machine learning modelssuch as logistic regression and random forest classifiers. Withthe increasing popularity and performance of deep neural net-works, it would be interesting to compare how a accurate adeep learning model could predict players compared to thesetraditional machine learning models.
Automated detection tool
Finally, because we have found strong results for classifyingwith random forest classifiers, our models can be applied tocreate a real world tool for automatic detection. The tool canbe prototyped to accept replays and return predictions. Suc-cessful prototyping with a high accuracy and high precisioncan lead to the tool being used in online tournaments as amethod to prevent cheating. The creation of such a tool willalso help to refine the model by exposing it to a much largervariety of players and matches.
CONCLUSION
This work explores the extraction and use of various featuresfrom Dota 2 match replays, in combination with mouse track-ing techniques, to predict who the player is based on theirin-game behavior. We explore two approaches to this problem.Firstly, we predict identity from specific player from a poolof players. For this approach, we find that the use of gamestatistics with a random forest classifier can achieve high pre-diction rates but it could not combine poor features with goodfeatures without losing a lot of accuracy.Our second, more universal approach centres around matching(or otherwise) players from two different matches from the setof all players. This approach is more generalizable to the largeplayer-base. We find the combination of game starting items,mouse movements and game statistics using a random forestclassifier can produce an accuracy of 95%.
REFERENCES
1. 2018a. DOTABUFF - Dota 2 Statistics. . (2018). Last accessed Oct2018.2. 2018b.
Dota 2 . Game [PC]. (2018). Valve Corporation,Washington, U.S. Last played Nov 2018.3. 2018. The OpenDota Project. https://github.com/odota .(2018). Last accessed Nov 2018.4. Ahmed Awad and Issa Traore. 2007. A New BiometricTechnology Based on Mouse Dynamics.
Dependable andSecure Computing, IEEE Transactions on
DOI: http://dx.doi.org/10.1109/TDSC.2007.70207
5. Sander Bakkes, Pieter Spronck, and Giel van Lankveld.2012. Player behavioural modelling for video games.
Entertainment Computing
International Journal ofComputer Applications (2013).7. Kevin Conley and Daniel Perry. 2013. How Does He SawMe? A Recommendation Engine for Picking Heroes inDota 2. Course Project, Stanford University. (2013).8. Anders Drachen, James Green, Chester Gray, Elie Harik,Patty Lu, Rafet Sifa, and Diego Klabjan. 2016. Guns andGuardians: Comparative Cluster Analysis and BehavioralProfiling in Destiny.
DOI: http://dx.doi.org/10.1109/CIG.2016.7860423
9. A. Drachen, M. Yancey, J. Maguire, D. Chu, I. Y. Wang,T. Mahlmann, M. Schubert, and D. Klabajan. 2014.Skill-based differences in spatio-temporal team behaviourin defence of the Ancients 2 (DotA 2). In . 1–8.10. Christoph Eggert, Marc Herrlich, Jan Smeddinck, andRainer Malaka. 2015. Classification of Player Roles inthe Team-Based Multi-player Game Dota 2. In ,Konstantinos Chorianopoulos, Monica Divitini,Jannicke Baalsrud Hauge, Letizia Jaccheri, and RainerMalaka (Eds.), Vol. LNCS-9353. Springer InternationalPublishing, Trondheim, Norway, 112–125.
DOI: http://dx.doi.org/10.1007/978-3-319-24589-8_9
Part 1:Full papers.11. Clint Feher, Yuval Elovici, Robert Moskovitch, LiorRokach, and Alon Schclar. 2012. User IdentityVerification via Mouse Dynamics.
Inf. Sci.
201 (10 2012),19–36.
DOI: http://dx.doi.org/10.1016/j.ins.2012.02.066
12. Hugo Gamboa and Ana L. N. Fred. 2004. A BehaviouralBiometric System Based on Human ComputerInteraction, Vol. 5404.13. Lynn Gao, James Judd, Dave Wong, and Jamie Lowder.2013. Classifying Dota 2 Hero Characters Based onPlayer Style and Performance.
University of Utah Courseon ML (2013).14. Victoria Hodge, Sam Devlin, Nick Sephton, FlorianBlock, Anders Drachen, and Peter Cowling. 2017. WinPrediction in Esports: Mixed-Rank Match Prediction inMulti-player Online Battle Arena Games. arXiv preprintarXiv:1711.06498 (2017).15. Nicholas Kinkade and Kyung yul Kevin Lim. 2015. Dota2 Win Prediction. Course project, University ofCalifornia, San Diego. (2015). 16. Haewoon Kwak, Jeremy Blackburn, and Seungyeop Han.2015. Exploring Cyberbullying and Other Toxic Behaviorin Team Competition Online Games. In
Proceedings ofthe 33rd Annual ACM Conference on Human Factors inComputing Systems (CHI ’15) . ACM, New York, NY,USA, 3739–3748.
DOI: http://dx.doi.org/10.1145/2702123.2702529
17. Siming Liu, Christopher Ballinger, and Sushil J. Louis.2013. Player Identification from RTS Game Replays. (2013), 313–318.18. Siming Liu, Christopher Ballinger, and Sushil J. Louis.2014. Identifying Players and Predicting Actions fromRTS Game Replays. (2014).19. Alan Nguyen. 2018. Unversity of Georgia Player Banned. https://cstarleague.com/dota2/news_articles/762 .(2018).20. Tinnawat Nuangjumnong and Hitoshi Mitomo. 2012.Leadership development through online gaming. (112012).21. Johannes Pfau, Jan David Smeddinck, and Rainer Malaka.2018. Towards Deep Player Behavior Models inMMORPGs. In
Proceedings of the 2018 AnnualSymposium on Computer-Human Interaction in Play(CHI PLAY ’18) . ACM, New York, NY, USA, 381–392.
DOI: http://dx.doi.org/10.1145/3242671.3242706
22. Martin Schrodt. 2018. Clarity. https://github.com/skadistats/clarity . (2018). Lastaccessed Dec 2018.23. Adam Summerville, Michael Cook, and Ben Steenhuisen.2016. Draft-Analysis of the Ancients: Predicting DraftPicks in DotA 2 using Machine Learning. (2016). https://aaai.org/ocs/index.php/AIIDE/AIIDE16/paper/view/14075
24. Yinglian Xie, Fang Yu, Kannan Achan, Eliot Gillum,Moises Goldszmidt, and Ted Wobber. 2007. HowDynamic Are IP Addresses?. In
Proceedings of the 2007Conference on Applications, Technologies, Architectures,and Protocols for Computer Communications(SIGCOMM ’07) . ACM, New York, NY, USA, 301–312.
DOI: http://dx.doi.org/10.1145/1282380.1282415
25. Pu Yang, Brent Harrison, and David L. Roberts. 2014.Identifying Patterns in Combat that are Predictive ofSuccess in MOBA Games.
Proc. of Foundarions ofDigital Games (FDG’14) (2014).26. Yifan Yang, Tian Qin, and Yu-Heng Lei. 2016. Real-timeeSports Match Result Prediction.