Interplay of Game Incentives, Player Profiles and Task Difficulty in Games with a Purpose
IInterplay of Game Incentives, Player Profilesand Task Difficulty in Games with a Purpose
Gloria Re Calegari and Irene Celino
Cefriel – Politecnico of Milano, Viale Sarca 226, 20126 Milano, Italy { gloria.re,irene.celino } @cefriel.com Abstract.
How to take multiple factors into account when evaluatinga Game with a Purpose? How is player behaviour or participation influ-enced by different incentives? How does player engagement impact theiraccuracy in solving tasks? In this paper, we present a detailed investiga-tion of multiple factors affecting the evaluation of a GWAP and we showhow they impact on the achieved results. We inform our study with theexperimental assessment of a GWAP designed to solve a multinomialclassification task.
Games with a Purpose [1] are a well-known Human Computation approach [2]to encourage users to execute tasks with an entertaining reward. While severalmetrics are proposed in literature to evaluate the ability of GWAPs to achievetheir intended purpose, there is a large number of factors that influences theirsuccess and effectiveness.In order to fully understand the strengths as well as the weaknesses of aGWAP, we propose an approach that takes into account player characteristics (reliability, participation, behaviour and accuracy), game aspects (playing incen-tive, playing style and game nature) and features of the task to be solved (levelof difficulty and variety). Our goal is to investigate the interplay between thosedifferent factors, by proposing a multi-faceted analysis framework that allowsfor a deep assessment and understanding of the efficacy of a GWAP to achieveits purpose. We apply the proposed framework to a specific GWAP to show theempirical results and the insights that can be drawn through our approach.The original contributions of this paper are: (1) an extension of traditionalGWAP metrics to take temporal evolution and incentive effects into account; (2)a comparison of engagement metrics and engagement profiles with non-gamingcitizen science; and (3) the definition of GWAP-specific engagement profiles andtheir interplay with different factors (incentive, task difficulty and task variety).The remainder of the paper is organized as follows: Section 2 illustrates themain related work; Section 3 gives details about the GWAP that we use to ex-emplify our approach; in the following sections, we propose different evaluationmethods, by extending state-of-the-art metrics: global GWAP metrics and in-terplay with incentive are adopted in Section 4, Section 5 offers a comparisonwith citizen science user engagement profiles and Section 6 proposes new GWAP a r X i v : . [ c s . H C ] N ov Gloria Re Calegari, Irene Celino player profiling driven by measures of participation and accuracy; finally, Sec-tion 7 concludes the paper.
The basic metrics to evaluate GWAPs [1,2,3] are global indicators computedas means over the entire data; while effective in summarizing the behaviour ofGWAP players, those are very simple measures that do not tell the entire story:an analysis of data distribution and temporal evolution is usually required toget a deeper understanding of a GWAP.Some work exists on cross-feature analysis of GWAPs [4] and similarly oncitizen science [5] and crowdsourcing [6]; our goal is to contribute to making suchevaluation easier to replicate and reproduce.Participation incentives are usually classified as intrinsic or extrinsic motiva-tion [7]. Some comparative analysis of incentives exists for GWAPs [8], especiallyin contrast to different methods like micro-working [9,10,11] or machine learn-ing [12]. The effect of competition and tangible rewards on participation andquality of results has also been explored, both in the context of GWAPs [13] andonline citizen science campaigns [14], revealing the pros and cons of designingdifferent motivation mechanisms.Other metrics to evaluate GWAPs can be borrowed from studies of socialcommunity [15] and citizen science evolution [16]; in those cases, however, userparticipation’s “success” is measured through simple indicators like number ofparticipants and contributions, while a deeper investigation is needed to assessthe effectiveness of participation. Behavioural studies in HCI research have inves-tigated volunteer characterization in citizen science, defining engagement metricsand profiles [17,18], which may or may not apply to GWAP players.In the context of (paid) crowdsourcing, assessment is usually conducted inrelation to micro-work platforms [19], in which important features are related tocost minimization [20,21] which is out of scope with respect to our work.While Games with a Purpose are a well-known and widely adopted humancomputation method to involve users in task solution, a comprehensive assess-ment of their ability to address their “purpose” needs to take into account mul-tiple factors affecting the game and the players. We therefore propose a multi-faceted analysis framework for GWAPs that includes game aspects, player char-acteristics and task features, with specific focus on the effect of game incentiveson the overall GWAP efficacy.
The GWAP that we will use as running example is Night Knights, an onlinegame for the multinomial classification of images . Pictures come from a massivepublic-domain dataset provided by NASA and they can be classified according to Cf. .nterplay of Game Incentives, Player Profiles and Task Difficulty in GWAPs 3(a) Classify an image (b) Agreement (c) Disagreement
Fig. 1.
Night Knights: the gameplay six different categories depending on their visual content. The classified images– in particular those labeled with three of the six categories – are then usedin a subsequent scientific workflow in the field of astronomy and environmentalsciences to measure light pollution effects (cf. [12]).The GWAP is inspired by the ESP game [3], because users play in randompairs according to an output-agreement mechanism [1]. The game adopts a re-peated labeling approach [22] by asking different players to classify the sameimage; conversely, the same image is never given twice to the same player. NightKnights is built on top of our open source software framework for GWAPs [23].The players visualize a picture and six buttons reporting the six possiblecategories (cf. Figure 1); the labeling task is therefore executed by clicking on thecategory that better fits the picture content. Each game round lasts one minute,during which players can classify as many images as they can (as detailed in thefollowing, on average 15 pictures are played per round); each time the two playersagree, they gain points and level up in the game leaderboard; some badges arealso assigned in special conditions as additional game intrinsic incentives.Players’ contributions are aggregated through an incremental truth inferencealgorithm [24] that (1) processes inputs as soon as they are provided, (2) weightsplayers’ answer with a round-specific reliability measure [25] taking into accountplayers’ answers on control tasks (for which the “true” solution is known), and (3)dynamically adjusts the number of required contributions. Our truth inferenceapproach accounts for the very nature of GWAPs, in which usually there isno “deadline” for contributing, players’ varying attention can impact answer
Gloria Re Calegari, Irene Celino quality and task difficulty needs a dynamic estimation of the required numberof repeated labeling.In this paper, we use the data collected through Night Knights. The gamewas released in February 2017 and then it was more extensively advertised fora related competition whose winner joined the 2017 Summer Expedition to ob-serve the Solar Eclipse in USA. The competition lasted about one month, frommid June to mid July 2017, and was addressed to all EU University students.After the end of the competition, the game has still been available online, butwithout any additional advertising. Overall, the data we analyse was collectedin 9 months, one month of competition and 4 months before and after it .In the following experimental sections, we apply a set of assessment meth-ods on this game data. On the one hand, we exemplify the analyses we proposefor a thorough multi-faceted assessment of GWAPs; on the other hand, we pro-vide concrete results from the evaluation of Night Knights, which are – at leastpartially – typical of GWAPs. The main metrics adopted in literature [2] to evaluate GWAPs are: throughput ,computed as the average number of solved task per unit of time, average lifeplay or ALP, i.e. the average time spent by each user playing the game, and expected contribution or EC, measured as average number of tasks solvedby each player. A task is solved when player contributions, aggregated by thetruth inference algorithm [26], output a “true” solution. Those indicators areglobal measures, as they are computed as mean values over the entire GWAPuse. Hereafter, we extend this analysis by assessing the influence of differentgame incentives and the evolution over time of game-play and engagement.In particular, we investigate how player participation and GWAP resultschange with and without an extrinsic motivation such as a tangible reward [7].We analyse incentive effect in terms of both general statistics and specific metricsadopted in GWAP evaluation. We show that users participation can be highlyinfluenced by the presence of an extrinsic motivation.
In 9 months, Night Knights managed to engage about 650 users that played asubstantial amount of time and classified almost 28,000 photos (cf. Table 1).Measuring the main metrics in the three periods ( before , during and after thecompetition), we notice a significant increase of player participation during thecompetition, both in terms of given contributions and classified images (one orderof magnitude higher with the additional incentive in both cases). This differenceis clearly highlighted in Figure 2, which shows the temporal evolution of thenumber of images classified per day. The difference between throughput, ALP Data is available with a CC-BY license at http://ckan.stars4all.eu/ .nterplay of Game Incentives, Player Profiles and Task Difficulty in GWAPs 5
Before During Aftertime span (months) classified images contributions users
285 174 174 total play time (hours)
65 471 29 throughput (tasks/hour)
69 212 113
ALP (mins/user)
EC (tasks/user)
Table 1.
Experimental results in the three periods (before, during and after the intro-duction of the extrinsic motivation) and EC in the competition and non-competition periods is statistically significant(t-test or Wilcoxon rank sum test at the 0.01 significance level). Also the playtime significantly increases during the competition period, as demonstrated bythe ALP metrics which reaches values over 65 minute/player (cf. Table 1).Those results prove that providing a tangible reward to players can makethem contribute more efficiently, speeding up the classification process (higherthroughput), engaging them for a longer time (higher ALP), and ensuring alarger contribution rate to the human computation task (higher EC). As a globalresult, more tasks get solved.
Fig. 2.
Number of images classified per day in the three periods
Adding a tangible prize to a game does not seem to ensure lasting effects. InNight Knights, looking deeper in the before and after periods in Table 1, wedo not notice substantial differences in terms of classification and participationrate. The metrics of the before period are slightly higher, probably due to thefact that more users tried the game, attracted by advertising campaigns (smallpeaks in Figure 2) and by the novelty of the game.Given this similarity, in our analysis we think it worth distinguishing onlybetween intrinsic motivation periods (e.g., Night Knights before and after peri-ods together, when users play only to have fun) and extrinsic motivation periods(e.g., the during phase of Night Knights, with the tangible and valuable reward).
Gloria Re Calegari, Irene Celino
Defining contribution speed the number of images played in each round, we checkif also this metrics is influenced by a tangible reward.As explained in Section 3, each round in Night Knights lasts one minute andeach user is asked to classify one image at a time, so users have to be quick andclassify as many images as possible to increase their score and being successfulin the game. Given the image loading time, connection delays and waiting timefor the other player’s answer, we estimate that in this case classifying each imagetakes at least 3–5 seconds, which means 12–20 photos per round.As Figure 3 shows, in the extrinsic motivation period, the contribution speedfollows a normal distribution centered around 15 photos/round, while, in the intrinsic motivation phase, the distribution is flat and most players played lessthan 10 images/round. This indicates that, during the competition, all playersdid their best to classify as many images as possible, reaching a median value of15 that coincides with the estimated image classification time. On the other hand,in the intrinsic motivation period, people play the game in a more “relaxed” way,just to try and explore it, taking more time to answer. (a) Extrinsic motivation (b) Intrinsic motivation
Fig. 3.
Distribution of the number of images played in each round
As a first step to the assessment of player behaviour, we adopt the engagementmetrics proposed by [17]: activity ratio , number of days a user plays a gamedivided by the total number of days the user remains linked to the game; dailydevoted time , average time (e.g. in hours) a user plays the game in each activeday; relative active duration , ratio of days during which a player remainslinked to the game and the total number of days since the player joined thegame until the day the game is over (this metric can be computed only if a“game end” is envisaged, which is not always the case in GWAPs); and varia-tion in periodicity , standard deviation of the intervals between each pair ofnon-consecutive active days. Computing those metrics for each player and then nterplay of Game Incentives, Player Profiles and Task Difficulty in GWAPs 7 applying clustering techniques leads to the identification of engagement profiles .Our goal is to assess if the profiles recognized in citizen science literature withrespect to volunteer behaviour are also detected in GWAP player behaviourand if player profiles are affected by game incentives. Indeed, we expect playerbehaviour to differ from volunteer engagement.
The mean values (and in brackets standard deviation) of the four main en-gagement metrics defined by [17] are shown in Table 2. For Night Knights, wedistinguish the global values and those measured during the competition only(extrinsic motivation period); for comparison, we also report the values for thecitizen science initiatives illustrated in [17,18]. Daily devoted time for NightKnights is measured by approximation, multiplying the number of game roundsper 1-minute duration (the actual time is higher, because players also browseleaderboards, badges, played pictures, etc.); relative active duration is computedonly during the competition time, where a “project finish time” is defined withthe contest deadline.We observe that Night Knights players display quite a different behaviourwith respect to volunteers: they show a 2-3 times higher activity ratio, and alsoconsistently higher values for daily devoted time and relative active duration; thismay mean that GWAP players tend to contribute in a more regular manner thanvolunteers. Focusing on the competition, those metrics also show a clear increasein engagement, with a significantly lower value of variation in periodicity, whichsuggests that the limited-time contest period stimulates players to access thegame even more frequently and regularly.Clustering players to identify engagement profiles does not give the sameresults as in the cited citizen science analyses [17,18]. Cross-validation betweendifferent methods (within groups sum of squares and Silhouette statistics) sug-gests an optimal clustering with 3 groups. Applying both agglomerative hierar-chical clustering and K-means clustering yields to similar and very unbalancedgrouping, with one big cluster (around 90% of players) roughly corresponding
Night Knights MW GZ WI global compet. [17] [17] [18]
Activity ratio 0.96 (0.17) (0.16) 0.40 (0.40) 0.33 (0.38) 0.32 (0.35)
Daily devoted time (3.30) 0.44 (0.54) 0.32 (0.40) –
Rel. active duration – (0.35) 0.20 (0.30) 0.23 (0.29) 0.43 (0.44) Var. in periodicity (2.12) 18.27 (43.3) 25.23 (49.2) 5.11 (5.36)
Table 2.
Engagement metrics (mean values and standard deviation in brackets): com-parison of Night Knights (global values and competition-only metrics) with citizenscience campaigns (MW: Milky Way, GZ: Galaxy Zoo, WI: Weather-it). Gloria Re Calegari, Irene Celino to the hardworker profile (high activity ratio and low variation in periodicity);the remaining players are grouped in a small cluster that we can name “focused”hardworkers (similar to hardworkers but with higher daily devoted time) andanother small cluster that does not clearly correspond to known profiles (lowvalues of all metrics, but higher variation in periodicity). The spasmodic, per-sistent, lasting and moderate profiles defined in [17] are not observed. This canbe interpreted as another difference between players and volunteers engagement,with game users either heavily playing and contributing, or simply trying outthe game without being actually engaged.
If we also evaluate user engagement in terms of when players participated, i.e.for how long they played the game, from the first to the last played round, wediscover that only few users played the game both in the intrinsic motivationand extrinsic motivation periods; in particular, only 13 users played both before and during the competition and only 17 users became aware of the existence ofthe game during the competition and went on playing it after its end.In addition, by analysing the users’ total active time (difference between thelast and the first time a user played the game), we discover that most of theusers played for a very short amount of time; 75% of players used the game forless than 5 minutes and only the 10% played for more than a day.These statistics are not surprising, because they are strong indicators of thegame nature, which is a so-called casual game . Casual games are usually de-signed to be played in short bursts of a few minutes and then set aside. By theirvery nature, casual games target the short free/leisure time between the myriadof everyday tasks, such as between work and domestic obligations or betweenattention and distraction [27]. Regarding the overall time spent playing mobilegames, the literature shows that an average gamer spends every day approxi-mately 24 minutes playing games on mobile devices, with heavy gamers spendingabout 1 hour/day and light gamers about 2 minutes/day [28].
Given that volunteer profiles in citizen science do not seem to suitably describeGWAP players, we focus our investigation on two additional main metrics, playeraccuracy and player participation, more closely related to human computation,and analyse their interplay with different factors, like game incentive, task dif-ficulty and task variety. The goal is to uncover GWAP-specific user behavioursand to identify
GWAP-specific player profiles . Player accuracy is measured ex-post by counting how many tasks eachuser correctly solved over the total number of tasks he/she played with; in thiscontext, “correct” refers to the final task solution computed by the truth in-ference algorithm. Accuracy takes values between 0 and 1 and corresponds tothe worker precision or labeling quality metrics used in crowdsourcing literature nterplay of Game Incentives, Player Profiles and Task Difficulty in GWAPs 9 (e.g. [26]).
Player participation is measured as the total number of contribu-tions given by each user in the game rounds he/she played. While there are ofcourse alternative ways to measure participation (e.g., number of game rounds,total played time), we prefer to consider the number of contributions, since thisindicator is more closely related to the “task” execution and the game purpose.
Referring again to Night Knights data, we plot each user as a data point alongparticipation and accuracy axes (cf. Figure 4). To divide players into groups, weapplied clustering as in Section 5, but – at least in the case of Night Knights – theresults put 98-99% of players in the same cluster, placing only “outliers” in theother clusters. Therefore, to define GWAP-specific profiles, we propose to simplyset separation thresholds on the two axes dividing the space into quadrants; morespecifically, we adopt the median as separation value, which is a commonly usedmeasure and robust statistic. While this definition is arbitrary, it is also data-independent, thus the proposed approach can be adopted to analyse and comparedifferent GWAPs without loss of generality.The thresholds calculated on the Night Knights dataset are 12 contributionsfor the x-axis and 0.87 accuracy for the y-axis. The median value for participationroughly corresponds to the separation between those who played just a coupleof game rounds from those who were more deeply engaged (cf. Section 4). Themedian accuracy value is quite high and this is a good sign about the GWAPefficacy to achieve its purpose; in other cases, when a specific minimum valueof accuracy is required, the threshold choice could be driven by domain-specificconsideration instead of being identified by the median.By using this approach, the investigation space is divided into areas thatrepresent different “behavioral” profiles as follows. Along the accuracy axis, weobtain two profiles: accurate players , i.e. players with an accuracy higher thanthe median, and the remaining inaccurate players (cf. Figure 5-a). Along the
Fig. 4.
Players’ participation vs. accuracy and median values0 Gloria Re Calegari, Irene Celino
Fig. 5.
Definition of GWAP-specific player profiles participation axis (cf. Figure 5-b), we define casual players those who contributeless than the median, and frequent players the most addicted and loyal contrib-utors. Considering both dimensions, we define four profiles (cf. Figure 5-c): – Beginners (bottom-left): this is the set of users that play the game for ashort period of time, just for curiosity; this kind of players gives only fewcontributions with low accuracy. – Snipers (top-left): users that are very accurate in their contributions butthey contribute only a little. Ideally, they should be motivated to becomechampions, since their contributions are valuable. – Champions (top-right): this is the most desirable category of players, sincethey have high level of participation with very high accuracy. – Trolls (bottom-right): this is the category of less desirable users, since theygive a lot of inaccurate contributions; having a lot of
Trolls in the gameeither makes the classification process longer, since it is harder to reach anagreement, or even leads to undesired results.Observing again Night Knights data, we can also quantitatively analyse theeffect of game incentive on the profile composition (cf. Figure 6). With extrinsicmotivation, most users (53%) acted as champions, and this share is much higherthan in the total (32%). On the other hand, with the intrinsic motivation only,
Fig. 6.
Distribution of players between profiles, in total and with different incentivesnterplay of Game Incentives, Player Profiles and Task Difficulty in GWAPs 11 the presence of champions was lower, only 25%. This difference may indicatethat the different incentives lead to different user behaviour; the presence oftangible rewards can engage users for a longer time and can motivates them tocontribute with more effort and attention.With intrinsic motivation, also the percentages of snipers was higher than theaverage. The largest group of users in the intrinsic motivation period, however,was beginners (37%): probably this happened because they tried the game justfor curiosity or to understand how the game works, without paying too muchattention to the answers they gave. As expected, the number of beginners wasvery low with the extrinsic motivation, since they had a clear goal to play thegame. Fortunately, the percentages of trolls were low in both periods. This meansthat the Night Knights game succeeded in avoiding too many spammers thatcould have made the classification process longer or more inaccurate.While the above results are specific to Night Knights, the profile analysis canbe applied to any other GWAP; indeed, examining the composition of a GWAPplayer population can reveal different behaviour and inform game re-design.Finally, we would like to point out an insight that is not immediately evidentin Figure 4: since the players on the right part of the plot are those who con-tributed more, if we sum the contributions from the four profiles, we obtain thefigures in Table 3. In the case of our GWAP, therefore, the large majority of con-tributions comes from the most active and accurate players, which is reassuringwith respect to the achievement of the game purpose.
Beginners Snipers Champions Trolls
Task contributions 0.7% 0.4% 95.9% 3.0%
Table 3.
Distribution of contributions across players profiles
In the following, we analyse the interplay between player accuracy and playerparticipation by taking into account additional factors. More specifically, wecheck if there is a statistically significant difference between the mean accuracyof casual and frequent players with respect to some control variables, namelythe incentive type, the task difficulty and the task variety.
To answer this question, we check for mean difference in accuracy for casual andfrequent players in the intrinsic and extrinsic motivation periods.In Night Knights, the average accuracy of the frequent players is higher thanthe one of casual players in both periods, as shown in the first two boxplotsof Figure 7; this difference is also significant from a statistically point of view(p-value of the t-test less than 0.05). We also notice a mean accuracy increaseof about 10% when a tangible rewards is present (from 0.74 to 0.81 for casualand from 0.83 to 0.90 for frequent): since during the competition users were
Fig. 7.
Accuracy distribution of casual and frequent players with different incentives( a and b ) and with different task difficulty ( c and d ). The difference between players’profiles is statistically significant in all cases except for easy tasks. encouraged to play to win the prize, they paid more attention to the imageclassification, raising also the answers’ quality.This may indicate that in GWAPs frequent players contribute in a moreaccurate way than casual ones, and that extrinsic motivation has a positiveimpact on accuracy. We define task difficulty as the number of different users needed to solve it(the higher the number, the harder the task); this is because our incrementaltruth inference algorithm (cf. Section 3) dynamically estimates the number ofcontributions required to solve a task. We split the images in two sets based ontheir difficulty and we check if this impacts player behaviour.For Night Knights, we marked as “easy” the images that requires only 4 con-tributions (the minimum number to reach an agreement according to our domainexperts), and as “difficult” those that required more contributions. “Easy” im-ages are 58% of all classified images, while the number of contributions requiredto classify “difficult” images ranges from 5 to 17.As shown in the (c) and (d) boxplots in Figure 7, accuracy on “easy” imagesis almost the same between casual and frequent players (indeed, the difference inmean accuracies is not statistically significant). On the contrary, this difference isstatistically significant for “difficult” images (mean accuracy is 0.84 for frequentplayers and 0.68 for casual players ).Those results suggest a learning effect in GWAPs: the more a user plays thegame, the more he/she understands the task to be solved, thus increasing his/heraccuracy and consequently also result quality.
Since Night Knights aims to solve a multinomial classification task, we inves-tigate whether there is any evident phenomenon related to the different image nterplay of Game Incentives, Player Profiles and Task Difficulty in GWAPs 13
Black City Stars Aurora ISS NoneCasual
Frequent
Table 4.
Mean accuracy of casual and frequent players with images of different cate-gories. The difference is not statistically significant for any of the categories. categories. Therefore, we compute again the accuracies of the two groups of ca-sual and frequent players in classifying the 6 output classes. We summarize themean accuracy values in Table 4.Applying the t-test to check if the mean accuracy is different for the twoplayers’ profiles, we cannot reject the null hypothesis. This may mean that anyplayer is equally able/unable to distinguish the different categories, indepen-dently of his/her level of participation; indeed, in our GWAP, there is no needfor background- or domain-specific knowledge to play the game. This analysiscan help in identifying the need for training or expert knowledge of GWAPplayers.On the other hand, the mean accuracy values change a lot across differentcategories, spanning between 0.57 and 0.91. This is also explained by the differ-ent distribution of easy/difficult tasks across the variety of classes, as shown inFigure 8. Indeed, some categories are intrinsically more difficult to classify thanothers, but Table 4 shows that this complexity related to task variety is equallyperceived by players with low and high levels of participation.
Fig. 8.
Distribution of easy/difficult tasks across different image categories.
In this paper, we presented an investigation of the interplay of different factorsin the evaluation of GWAP results. More specifically, we focused on the profilingof players according to different user metrics and we studied the influence ofgame incentive and task characteristics.
To inform our discussion, we described the results of such multi-dimensionalanalysis over the data collected by a GWAP for multinomial classification ofimages. While some of our considerations result from the quantitative analysisof a single game, and are not per se generalizable, we believe that the proposedapproach is replicable to evaluate any other GWAP. We believe that such deeperanalysis is an important (and sometimes neglected) investigation to understandplayers’ behaviour, to evaluate the impact of various factors on reliability andquality, and finally to assess the ability of GWAPs to achieve their intendedpurpose and its sustainability over time.Finally, we would like to point out that, even when player participation islimited in time, a classification GWAP can be used to build a reasonably largetraining set to be used in traditional machine learning settings to train classifiersfor larger-scale labeling. In our previous work, we showed that humans andmachines indeed agree on image classification for the Night Knights dataset [12].
Acknowledgments
This work is partially supported by the STARS4ALL project (H2020-688135), co-funded by the European Commission. We thank all the Night Knights players whocontributed to the classification task solution and allowed us to perform this work.
References
1. Von Ahn, L., Dabbish, L.: Designing games with a purpose. Communications ofthe ACM (8) (2008) 58–672. Law, E., Ahn, L.v.: Human computation. Synthesis Lectures on Artificial Intelli-gence and Machine Learning (3) (2011) 1–1213. Von Ahn, L., Dabbish, L.: Labeling images with a computer game. In: Proceedingsof the SIGCHI conference on Human factors in computing systems, ACM (2004)319–3264. Singh, A., Ahsan, F., Blanchette, M., Waldisp”uhl, J.: Lessons from an onlinemassive genomics computer game. In: Proceedings of the Fifth Conference onHuman Computation and Crowdsourcing (HCOMP 2017). (2017)5. Sauermann, H., Franzoni, C.: Crowd science user contribution patterns and theirimplications. Proceedings of the National Academy of Sciences (3) (2015)679–6846. Yang, J., Redi, J., Demartini, G., Bozzon, A.: Modeling task complexity in crowd-sourcing. In: Fourth AAAI Conference on Human Computation and Crowdsourc-ing. (2016)7. Ryan, R.M., Deci, E.L.: Intrinsic and extrinsic motivations: Classic definitions andnew directions. Contemporary educational psychology (1) (2000) 54–678. Prestopnik, N., Crowston, K., Wang, J.: Gamers, citizen scientists, and data:Exploring participant contributions in two games with a purpose. Computers inHuman Behavior (2017) 254–2689. Thaler, S., Simperl, E., Wolger, S.: An experiment in comparing human-computation techniques. IEEE Internet Computing (2012) 52–5810. Feyisetan, O., Simperl, E., Van Kleek, M., Shadbolt, N.: Improving paid micro-tasks through gamification and adaptive furtherance incentives. In: Proceedingsof the 24th International Conference on World Wide Web. WWW ’15, Republicand Canton of Geneva, Switzerland, International World Wide Web ConferencesSteering Committee (2015) 333–343nterplay of Game Incentives, Player Profiles and Task Difficulty in GWAPs 1511. Feyisetan, O., Simperl, E.: Social incentives in paid collaborative crowdsourcing.ACM Trans. Intell. Syst. Technol. (6) (2017) 73:1–73:3112. Re Calegari, G., Nasi, G., Celino, I.: Human computation vs. machine learning:an experimental comparison for image classification. Human Computation Journal (1) (2018) 13–3013. Siu, K., Zook, A., Riedl, M.O.: Collaboration versus competition: Design andevaluation of mechanics for games with a purpose. In: Proceedings of Foundationsof Digital Games Conference. (2014)14. Reeves, N., West, P., Simperl, E.: “A game without competition is hardly a game”:The impact of competitions on player activity in a human computation game. In:Proceedings of Human Computation Conference. (2018)15. Reeves, N., Tinati, R., Zerr, S., Van Kleek, M., Simperl, E.: From crowd to com-munity: A survey of online community features in citizen science projects. In:Proceedings of the 2017 ACM Conference on Computer Supported CooperativeWork and Social Computing, CSCW 2017. (2017) 2137–215216. Celino, I., Corcho, ´O., H¨olker, F., Simperl, E.: Citizen science: Design and engage-ment (dagstuhl seminar 17272). Dagstuhl Reports (7) (2017) 22–4317. Ponciano, L., Brasileiro, F.: Finding volunteers’ engagement profiles in humancomputation for citizen science projects. Human Computation Journal (2) (2015)247–26618. Aristeidou, M., Scanlon, E., Sharples, M.: Profiles of engagement in online com-munities of citizen science participation. Computers in Human Behavior (2017)246–25619. Allahbakhsh, M., Benatallah, B., Ignjatovic, A., Motahari-Nezhad, H.R., Bertino,E., Dustdar, S.: Quality control in crowdsourcing systems: Issues and directions.IEEE Internet Computing (2) (2013) 76–8120. Karger, D.R., Oh, S., Shah, D.: Budget-optimal task allocation for reliable crowd-sourcing systems. Operations Research (1) (2014) 1–2421. Han, T., Sun, H., Song, Y., Wang, Z., Liu, X.: Budgeted task scheduling for crowd-sourced knowledge acquisition. In: Proceedings of the 2017 ACM on Conferenceon Information and Knowledge Management, ACM (2017) 1059–106822. Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data qualityand data mining using multiple, noisy labelers. In: Proceedings of the 14th ACMSIGKDD international conference on Knowledge discovery and data mining, ACM(2008) 614–62223. Re Calegari, G., Fiano, A., Celino, I.: A Framework to build Games with a Purposefor Linked Data Refinement. In: proceedings of the International Semantic WebConference 2018, Resources Track. (2018)24. Celino, I., Re Calegari, G.: An Incremental Truth Inference Approach to AggregateCrowdsourcing Contributions in GWAPs. In: currently under revision. (2018)25. Celino, I., Contessa, S., Corubolo, M., Dell’Aglio, D., Della Valle, E., Fumeo, S.,Kr¨uger, T.: Linking Smart Cities Datasets with Human Computation: the case ofUrbanMatch. In: Proceedings of the 11th international conference on The SemanticWeb, Springer-Verlag (2012) 34–4926. Zheng, Y., Li, G., Li, Y., Shan, C., Cheng, R.: Truth inference in crowdsourcing: isthe problem solved? Proceedings of the VLDB Endowment10