[PDF] Tracing Player Knowledge in a Parallel Programming Educational Game

Abstract

This paper focuses on "tracing player knowledge" in educational games. Specifically, given a set of concepts or skills required to master a game, the goal is to estimate the likelihood with which the current player has mastery of each of those concepts or skills. The main contribution of the paper is an approach that integrates machine learning and domain knowledge rules to find when the player applied a certain skill and either succeeded or failed. This is then given as input to a standard knowledge tracing module (such as those from Intelligent Tutoring Systems) to perform knowledge tracing. We evaluate our approach in the context of an educational game called "Parallel" to teach parallel and concurrent programming with data collected from real users, showing our approach can predict students skills with a low mean-squared error.

Full PDF

TTracing Player Knowledge in a Parallel Programming Educational Game

Pavan Kantharaju and

Katelyn Alderfer and

Jichen ZhuBruce Char and

Brian Smith and

Santiago Onta ˜n´on

Drexel UniversityPhiladelphia, PA 19104 { pk398, kmb562, jz465, charbw, bks59, so367 } @drexel.edu Abstract

This paper focuses on tracing player knowledge in educa-tional games. Speciﬁcally, given a set of concepts or skillsrequired to master a game, the goal is to estimate the likeli-hood with which the current player has mastery of each ofthose concepts or skills. The main contribution of the paperis an approach that integrates machine learning and domainknowledge rules to ﬁnd when the player applied a certain skilland either succeeded or failed. This is then given as input to astandard knowledge tracing module (such as those from Intel-ligent Tutoring Systems) to perform knowledge tracing. Weevaluate our approach in the context of an educational gamecalled

Parallel to teach parallel and concurrent programmingwith data collected from real users, showing our approach canpredict students skills with a low mean-squared error.

Introduction

This paper focuses on the problem of player/student knowl-edge modeling in the context of educational games. Playerknowledge modeling is the problem of estimating the am-ount of knowledge or mastery that the current player pos-sesses in a certain set of concepts or skill of interest. Thisproblem has been studied in several ﬁelds, such as GameAI and Intelligent Tutoring Systems (ITS) (where it isknown as “knowledge tracing” (Corbett and Anderson 1994;Pavlik Jr, Cen, and Koedinger 2009)). Speciﬁcally, in thispaper we present a new knowledge modeling approach de-signed to monitor, in real-time, the degree of mastery thatthe current player has on the different skills required to playan educational game, based only on in-game player activ-ity. A key contribution of this work is being able to performknowledge modeling in complex educational games, whereit is hard to determine when did a student apply a certainskill, and whether it was successful or unsuccessful.Our approach is presented in the context of an educationalgame to teach parallel and concurrent programming called

Parallel (Onta˜n´on et al. 2017).

Parallel is an adaptive gamethat presents different players with a different level progres-sion, depending on their needs. In order for this adaptationto happen, the game needs to maintain an estimation of theknowledge of the current player (which concepts does thecurrent player understand and which it does not). It then

Copyright c (cid:13) presents each player with levels that are generated via pro-cedural content generation (PCG) based on this estimation.To address this problem, we build upon existing work onplayer modeling (Yannakakis et al. 2013), and work from theIntelligent Tutoring Systems (ITS) community on knowl-edge tracing (Corbett and Anderson 1994; Pavlik Jr, Cen,and Koedinger 2009). One of the challenges we faced whenattempting to integrate existing work on knowledge tracinginto

Parallel is that knowledge tracing assumes that assess-ing whether students are successfully or unsuccessfully de-ploying certain skills is easy.

However, in real gameplay ses-sions students deploy skills in an interweaved manner whileplaying the game, making this assessment non trivial. Forexample, students might drag and drop different elementsonto the game board just to explore the behavior of certaingame elements, making it hard to assess when they deployeda skill correctly or incorrectly.Our main contribution integrates supervised machine lear-ning inspired by existing work on player modeling and do-main knowledge to assess successful skill application, andapplies this assessment to knowledge tracing for educationalgames. Speciﬁcally, the problem outlined above is addressedby capturing live-telemetry data and using machine learningto make predictions concerning the problem solving strat-egy (e.g. “trial and error”) students are currently deployingusing time windows. This problem solving strategy is usedas a proxy for whether they successfully or unsuccessfullyunderstand the different skills involved in the current level.This is then fed to a knowledge tracing framework. To im-prove the accuracy of the model, we included a collection ofdomain-speciﬁc rules that complement the machine learn-ing approach. We evaluated our approach using transcriptsof several think-aloud user study sessions to manually gen-erate ground truth with which to compare the predictionsmade by our model, showing a relatively low prediction er-ror. Our results also show that the idea of predicting problemsolving strategy from a series of time windows can be usedto identify successful/unsuccessful applications of skills.

Background

Two major areas of work are related to the work presentedin this paper: player modeling and knowledge tracing. Ina game environment, a player model is an abstracted de-scription of a player capturing certain properties of inter- a r X i v : . [ c s . A I] A ug st such as preferences, strategies, strengths, or skills (VanDer Werf et al. 2003). Signiﬁcant work exists in areas suchas modeling player preferences in order to maximize en-gagement (Riedl et al. 2008; Thue et al. 2007), providingbetter non-player-character AI (Weber and Mateas 2009), orgame analytics (Canossa 2013). For an overview on playermodeling, readers are referred to existing surveys (Smith etal. 2011; Machado, Fantini, and Chaimowicz 2011).The two approaches to modeling student knowledge mostrelated to our work are Knowledge Tracing (KT) and

Per-formance Factor Analysis (PFA). We refer interested read-ers to Harrison and Roberts (2012) for an overview of thesetechniques and others used in Intelligent Tutoring Systems(ITSs) (Sleeman and Brown 1982). One of the most commontypes of KT is Bayesian Knowledge Tracing (BKT) (Cor-bett and Anderson 1994), which uses Hidden Markov Mod-els. The model parameters are trained using ExpectationMaximization (Dempster, Laird, and Rubin 1977), Conju-gate Gradient Search (Corbett and Anderson 1994), or brute-force search (Baker, Corbett, and Aleven 2008).One issue with Knowledge Tracing is that it assumesthat there is a one-to-one mapping between questions andskills. However, in practice, questions usually require mul-tiple skills to answer them correctly (levels in

Parallel re-quire multiple interleaved skills). PFA (Pavlik Jr, Cen, andKoedinger 2009), based on prior work on Learning FactorAnalysis (Cen, Koedinger, and Junker 2006), addresses thisissue by using a model independent of questions. Speciﬁ-cally, PFA models the performance of a student as follows: m ( u, KC , c, n ) = (cid:88) j ∈KC ( β j + γ j c u,j + ρ j n u,j ) (1) p u ( m ) = 11 + e − m (2)where the performance of a student u is deﬁned, given a setof skills KC , as a function of the number of times ( c u,j ) thatthe student has correctly applied a given skill j or failed toapply it ( n u,j ) in the past. The ﬁnal performance of student( p u ) is then assessed using a logistic model (Equation 2).The model parameters ( β j , γ j and ρ j ) can be trained us-ing logistic regression. Thus, one necessary input to utilizethese models is the assessment of successful or unsuccessfulapplication of skills. This is not trivial in the context of edu-cational games like Parallel , and is one of the contributionsof this paper. We compare our work against a modiﬁcation ofPFA (described in our experimental evaluation section) for

Parallel . Although it is possible to extend BKT for multiple-skill questions (Gong, Beck, and Heffernan 2010), compar-ing against BKT is part of our future work.

Parallel: A Game for Learning Parallel andConcurrent Programming Concepts

Parallel (Onta˜n´on et al. 2017) is an educational computergame designed to help students learn parallel and concur-rent programming concepts as well as help us understandtheir learning processes. The game renders different paralleland concurrent programming concepts visually, and is de-signed as a puzzle game, where students must solve differ- Figure 1:

Parallel ’s Visual Representation of the “CigaretteSmokers Problem” (top), and a possible solution (bottom),with colors used to highlight which arrows move throughwhich tracks and their directions.ent problems in order to advance to the next level. Withineach level (e.g., Figure 1), players see a collection of ar-rows that follow different tracks (black lines, representingprograms). These arrows, which run at varying and unpre-dictable speeds (to represent non-determinism inherent inconcurrent programming), represent threads. The main goalis to design synchronization mechanisms so that arrows ac-complish the given challenge (e.g. deliver packages whilepreventing a “race condition”).At any time while composing the solution to a level, aplayer can press the “test” button, which will make the gamerun a simulation of the current solution (with arrows mov-ing stochastically). The solution might succeed or fail, butthe fact that it succeeds does not mean that it is a correct so-lution, since it might be the case that if the arrows were tohave moved at different relative speeds, the solution wouldfail. Students are free to “test” their solutions as many timesas they want, before pressing the “submit” button, whichcauses a model checker to use systematic search to test everypossible schedule of arrow movements, and see if the solu-tion would work in all possible situations. If the solutionworks, the level is considered solved.The game presents problems of varying complexity in theform of different levels, from preventing simple race con-ditions, to solving classic situations such as the “cigarettesmokers problem” (shown in Figure 1). Figure 1 (bot-tom) shows a possible solution to this level with all thesemaphores and signals in place and connected in the rightway. Additionally, we colored the tracks with the color ofthe arrows that can go through them. Notice that solutions to ime windows as player plays the gamefeature vectorFeatureExtractortelemetry data for the time window DomainKnowledgeRulesMachineLearning1.0 0.00.30.8 1.0 0.5 0.0Player knowledge modelProblem Solving StrategyPrediction KnowledgeTracingCurrentLevel skills required for current level SV u AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0R9Bj04jGi2QSSJcxOZpMxszPLPISw5B+8eFDEq//jzb9xkuxBEwsaiqpuurvijDNtfP/bW1ldW9/YLG2Vt3d29/YrB4ehllYR2iSSS9WOsaacCdo0zHDazhTFacxpKx7dTP3WE1WaSfFgxhmNUjwQLGEEGyeF92Evt5NeperX/BnQMgkKUoUCjV7lq9uXxKZUGMKx1p3Az0yUY2UY4XRS7lpNM0xGeEA7jgqcUh3ls2sn6NQpfZRI5UoYNFN/T+Q41Xqcxq4zxWaoF72p+J/XsSa5inImMmuoIPNFieXISDR9HfWZosTwsSOYKOZuRWSIFSbGBVR2IQSLLy+T8LwW+LXg7qJavy7iKMExnMAZBHAJdbiFBjSBwCM8wyu8edJ78d69j3nrilfMHMEfeJ8/p/CPKw== AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0R9Bj04jGi2QSSJcxOZpMxszPLPISw5B+8eFDEq//jzb9xkuxBEwsaiqpuurvijDNtfP/bW1ldW9/YLG2Vt3d29/YrB4ehllYR2iSSS9WOsaacCdo0zHDazhTFacxpKx7dTP3WE1WaSfFgxhmNUjwQLGEEGyeF92Evt5NeperX/BnQMgkKUoUCjV7lq9uXxKZUGMKx1p3Az0yUY2UY4XRS7lpNM0xGeEA7jgqcUh3ls2sn6NQpfZRI5UoYNFN/T+Q41Xqcxq4zxWaoF72p+J/XsSa5inImMmuoIPNFieXISDR9HfWZosTwsSOYKOZuRWSIFSbGBVR2IQSLLy+T8LwW+LXg7qJavy7iKMExnMAZBHAJdbiFBjSBwCM8wyu8edJ78d69j3nrilfMHMEfeJ8/p/CPKw== AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0R9Bj04jGi2QSSJcxOZpMxszPLPISw5B+8eFDEq//jzb9xkuxBEwsaiqpuurvijDNtfP/bW1ldW9/YLG2Vt3d29/YrB4ehllYR2iSSS9WOsaacCdo0zHDazhTFacxpKx7dTP3WE1WaSfFgxhmNUjwQLGEEGyeF92Evt5NeperX/BnQMgkKUoUCjV7lq9uXxKZUGMKx1p3Az0yUY2UY4XRS7lpNM0xGeEA7jgqcUh3ls2sn6NQpfZRI5UoYNFN/T+Q41Xqcxq4zxWaoF72p+J/XsSa5inImMmuoIPNFieXISDR9HfWZosTwsSOYKOZuRWSIFSbGBVR2IQSLLy+T8LwW+LXg7qJavy7iKMExnMAZBHAJdbiFBjSBwCM8wyu8edJ78d69j3nrilfMHMEfeJ8/p/CPKw== AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0R9Bj04jGi2QSSJcxOZpMxszPLPISw5B+8eFDEq//jzb9xkuxBEwsaiqpuurvijDNtfP/bW1ldW9/YLG2Vt3d29/YrB4ehllYR2iSSS9WOsaacCdo0zHDazhTFacxpKx7dTP3WE1WaSfFgxhmNUjwQLGEEGyeF92Evt5NeperX/BnQMgkKUoUCjV7lq9uXxKZUGMKx1p3Az0yUY2UY4XRS7lpNM0xGeEA7jgqcUh3ls2sn6NQpfZRI5UoYNFN/T+Q41Xqcxq4zxWaoF72p+J/XsSa5inImMmuoIPNFieXISDR9HfWZosTwsSOYKOZuRWSIFSbGBVR2IQSLLy+T8LwW+LXg7qJavy7iKMExnMAZBHAJdbiFBjSBwCM8wyu8edJ78d69j3nrilfMHMEfeJ8/p/CPKw== Step 1 :Feature Extraction

Step 2 : Skill Success/Failure Detection

Step 3 :KnowledgeTracing

Figure 2: Student Knowledge Modeling Processlevels like this are non-trivial, and would be very hard to ﬁndby trial and error. However, deploying concepts from paral-lel programming (such as the idea of “identifying the criticalsection”), the solution is easier to ﬁnd, and corresponds ex-actly to the typical solution to this problem in concurrentprogramming textbooks (Downey 2008).

Parallel has beendeployed twice in a real undergraduate computer sciencecourse to help teach students about parallel programming.

Player Knowledge Modeling in

Parallel

The goal of player modeling in

Parallel is to generate thesequence of levels that a player will experience. Playersshould play game levels that help them practice the con-cepts they do not currently understand, and that are feasi-ble given their current mastery of parallel programming. Theoutput of our player modeling approach is a player knowl-edge model capturing the level of mastery of the currentplayer in a set of concepts/skills required to play our game.In order to elicit the set of required concepts/skills, we em-ployed the methodology deﬁned by Horn, Cooper, and De-terding (2017), which resulted in the 21 skills shown in theleft column of Table 1, which we will denote by KC .Given KC , the knowledge model of a given student u isrepresented by a vector SV u = (cid:104) p ( s ) , p ( s ) . . . p ( s m ) (cid:105) ,where each element represents the student’s understandingof each skill s i ∈ KC as a likelihood that the student hasmastered that skill (in the interval from 0 to 1). A challengein our domain is that, unlike in previous work on ITS, it isnot obvious to identify when a player has attempted to de-ploy a skill, or when this deployment succeeded or failed.Suppose a level requires a student to apply two skills fromTable 1 at the same time, Block critical sections and

Use di-verters , but the user is just focusing on making the arrowsuse the diverters in the right way. The fact that, for sometime, they did not block the critical sections does not meanthat they failed at applying the skill, but that they focusedon something different. It is not trivial to infer what a playerwas focusing on, as they are free to place elements anywherewithin the level and “test” the proposed solution at any time.In order to address this problem, our approach usesthe idea of “time windows”, and tries to identify success- ful/failed attempts of skill application in each time window.Speciﬁcally, it has 3 main steps (illustrated in Figure 2): • Step 1: Feature Extraction : Real-time telemetry data con-cerning players actions are collected, and divided in timewindows of ﬁxed size. From each window, a collection offeatures is computed. • Step 2: Skill Success/Failure Detection : This step iden-tiﬁes whether a player was trying to successfully or un-successfully apply a certain skill. We employ two com-plementary approaches: a machine learning componentwhich predicts the problem solving strategy the player iscurrently deploying, from where successes or failures canbe inferred as detailed below, and a collection of domainknowledge rules , which can directly identify instances ofskill application for some skills (shown in Table 1). • Step 3: Knowledge Tracing : This step builds a knowledgemodel of a player using the output of the previous steps.The remainder of this section describes each of these steps.

Step 1: Feature Extraction

Given a sequence of telemetry information of a student fora level, and a tunable time interval size τ , we extract a fea-ture vector based on the telemetry information from eachtime interval ( t, t + τ ) at intervals of τ / . The feature vectorconsisted of frequencies and rates (per minute) of differentactions and events that occurred in game, as well as timedifferences between certain actions done by the player. Wecurrently calculate 65 features. Step 2: Skill Success/Failure Detection

We use two complementary approaches to detect when play-ers succeed or fail in deploying a skill: supervised machinelearning and a collection of domain knowledge rules.

Machine Learning:

Given that it is hard to predict whenstudents succeeded or failed in deploying skills in general,our approach instead predicts the problem solving strat-egy a student is deploying. After analyzing transcripts ofthink-aloud sessions from earlier user studies concerninghow players play

Parallel , we identiﬁed three basic prob-lem solving strategies:

Trial and Error , Sequential Think-ing , and

Parallel Thinking . We deﬁne

Parallel Thinking asthe process of considering multiple arrows (threads) at thesame time whereas

Sequential Thinking is deﬁned as solv-ing levels by considering one arrow at a time.

Trial and Er-ror is the process of repeatedly trying different solutions tosolve the current level by chance without any speciﬁc pur-pose. Given the set of skills in a given level l , KC l , whenour machine learning module predicts that the student is de-ploying trial and error for a given time window, we signalthat there is a failed attempt at deploying all the conceptsof KC l (notice that there might just be one of the conceptsthat has failed, but since we do not know which one, wesignal them all). When we predict parallel thinking, we sig-nal a successful application of the skills in KC l and for se-quential thinking, we signal all skills with 0.5 probability ofsuccess. Notice the strong assumption in this process: whenstudents deploy trial and error, we assume that they do notnderstand the logic behind a level. The high predictive ac-curacy reported in out experiments shows, however, that thisassumption works in practice.To train the machine learning model, we generate a train-ing set from a set of annotated play-throughs. These annota-tions corresponding to problem solving strategies that weredetermined through analysis of one-on-one game-playingsessions with each student using a think-aloud protocol by amember of our research team (and revised by another). Stan-dard supervised classiﬁcation was used in our experiments,as described in the evaluation section. Domain Knowledge Rules:

Additionally, a collection ofmanually deﬁned domain knowledge rules are applied toeach time window in order to detect additional evidence ofsuccessful or failed skill application. Table 1 shows the rulesused in our experiments. Each rule is used to detect a certainskill using both telemetry information and information fromthe game’s model checker, and was hand-authored by ob-serving video recordings of students playing the game. Ev-ery time a rule is successfully ﬁred, we signal a successfulapplication of the corresponding skill.

Step 3: Knowledge Tracing

Next, we describe the process of updating the student knowl-edge model using the output of machine learning, andthe domain knowledge rules. Recall that time windows oftelemetry data are generated while a player is playing a level.Let F be the set of time windows generated for the currentplay-through of a player u (which might include one or morelevels), and let us denote by ML ( f ) to the output of the ma-chine learning classiﬁer for time window f , encoding trial-and-error as 0, single-threaded thinking as 0.5, and parallelthinking as 1. Let us now call R s i ( f ) to be the number oftimes domain knowledge rule R s i was applied to time win-dow f . Also, let I s i ( f ) be the indicator function that is 1 ifskill s i was involved in the level from where f was extracted,and 0 otherwise. With these deﬁnitions, let us now deﬁne R s i ( F ) = (cid:80) f ∈ F R s i ( f ) , I s i ( F ) = (cid:80) f ∈ F I s i ( f ) , and M L s i ( F ) = (cid:80) f ∈ F | I si ( f )=1 ML ( f ) . Now, for a given stu-dent u , we compute the understanding of each skill s i ∈ KC in the knowledge model SV u as follows: p ( s i ) = M L s i ( F ) + R s i ( F ) I s i ( F ) + R s i ( F ) (3)Notice this is the average of the predictions made for a givenskill by machine learning and the domain knowledge rules. Experimental Evaluation

This section details our experimental evaluation includingdatasets, setup, and results. Concerning the machine learn-ing module, we evaluated seven different machine learningtechniques provided by the WEKA (Eibe Frank and Witten2016) framework:

J48 , Random Forest , Bagging , AdaBoost , Naive Bayes , Bayes Net , and

Multilayer Perceptron . All ma-chine learning algorithms were tuned to their default param-eters. Additionally, we compared the performance of our ap-proach against PFA (Pavlik Jr, Cen, and Koedinger 2009).

Datasets and Ground Truth

All datasets consist of telemetry logs from play sessions of

Parallel by real undergraduate students. Each log containsmouse movements and events triggered by the player.

Datasets:

In our experiments, we use two different datasets.The ﬁrst dataset, which we refer to as

Dataset A , containsdata from 31 levels and was gathered from one-on-one ses-sions with six students where we asked them to think-aloudtheir thinking process as they played six levels in our game(not all students completed all levels). The second dataset,which we call

Dataset B contains data from 395 levels col-lected from an undergraduate parallel programming courseof 17 students. For four weeks, each student was required toplay some set of levels (different from Dataset A) each weekat home. The set of levels played in datasets A and B weredifferent.

Ground Truth:

The ground truth for Dataset A (both forproblem-solving strategies, and for the level of mastery ofeach of the skills in the game), was manually generated byresearchers on the team from transcripts of the think-aloudsessions and recorded videos of their gameplay. For eachstudent, we calculated ground truth values for all skills forall the levels they played. For Dataset B, we requested eachstudent in the undergraduate class ﬁll out a survey assessinghow well they understood each parallel programming con-cept each week, which is used as the ground truth. SinceDataset B has no think-aloud transcripts, no ground truth onproblem-solving strategy exists for Dataset B, and thus, itcannot be used for training, but only to test our approach.

Experimental Setup

We had three objectives for our experimental evaluation.First, we wanted to analyze the accuracy of machine learn-ing techniques in predicting problem solving strategies (Ex-periment 1). Second, we wanted to evaluate the performanceof combining machine learning and domain knowledge forplayer knowledge modeling (Experiments 2 and 3). Finally,even if some of the information required to execute Perfor-mance Factor Analysis (PFA) is not directly available in oursetting, we wanted to compare against PFA assuming thatthis information were available (Experiment 4).

Experiment 1:

We computed prediction accuracy for eachmachine learning classiﬁer in predicting problem solvingstrategy over three time intervals τ (shown in Table 2) usingleave-one-student-out cross validation on the 31 traces fromDataset A. We tested on data from one student and trainedon data from the remaining. Experiments 2 and 3:

For the next two experiments, wecompared the use of both machine learning and rules forknowledge tracing against several baselines: random predic-tion, “always predict 1” (since skill values of 1.0 are the mostcommon in the ground truth, this performs better than ran-dom), using only machine learning (ML), and using onlyrules (R). For experiment 2, we trained on Dataset A andtested on Dataset B, and for experiment 3, we ran a leave-one-out cross validation on the set of students using DatasetA. This leave-one-out policy is the same as the one em-ployed in Experiment 1. Since predictions of skill value are kills Rules

Hover over components to see what they do Player hovers over componentUse help bar Player click on help bar and reads one or more of the guidesDrag objects Player clicks and drags either semaphore or signalPlace objects on the track Player either places a semaphore or signal on trackHover over side arrows to see different colored tracks Player hovers over arrows on side or clicks the side arrowsRemove unnecessary elements Player drags semaphore or signal to trashDeliver packages All required packages are deliveredBe able to link signals to direction switches Player links a signal to a direction switchBe able to link semaphores to signals Player links a semaphore to a signalUnderstand the use of semaphoresUnderstand that arrows move at unpredictable rates Player doesn’t place multiple semaphores along one track without connecting to any-thing or doesn’t move semaphore signiﬁcantly.Understand that events happen in different ordersUse divertersPrevent starvationBlock critical sections Player places semaphore and signals in the proper positions to block critical sectionSynchronized multiple arrows Player places semaphores and signals alternately on the tracks of the different arrows (asignal in arrow A’s path is linked to the semaphore in arrow B’s path, and vice-versa).Alternating access with semaphores and signalsTesting before submitting Player tests before submittingUnderstand speciﬁc delivery points Packages are delivered correctly without losing anyUnderstand exchange points Packages are transferred at delivery pointsDeliver packages with multiple synchronized arrows

Table 1: Rules used to detect successful application of each of the skills required to master

Parallel . Time Interval (sec) ( τ ) AdaBoost Bagging Bayes Net J48 Multilayer Perceptron Naive Bayes Random Forest Average Table 2: Experiment 1: Classiﬁcation Accuracy for Predicting Problem Solving Strategies (Dataset A)numbers between 0 and 1, we used the Mean-Squared Error(MSE) as a measure of error (lower is better).

Experiments 4:

We slightly modiﬁed PFA to effectivelycompare against our approach. Normally, we would estimatethe parameters β j , γ j and ρ j for some skill j using logis-tic regression classiﬁcation. In our case, we assumed β j re-mains constant (a common assumption in later PFA work),and learn the parameters γ j and ρ j . We also compute perfor-mance for each skill. Thus, Equation 1 becomes: m j ( u, KC , c, n ) = (cid:88) j ∈KC ( γ j c u,j + ρ j n u,j ) We note that training data must be binary for training PFA(via logistic regression). We used the ground truth fromDataset A because much of the ground truth values werebinary. Any ground truth that was not binary was discardedduring training. We also note that c u,j and n u,j were com-puted using the binary values from the ground truth ofDataset A. Thus, notice that performance reported for PFAassumes ideal conditions were this information is available(it is not available in realistic conditions in our domain), andthus performance reported for PFA should be taken as anupper bound on the performance we can expect to achieve. Results

Experiment 1:

Table 2 provides accuracy measures for dif-ferent classiﬁers over different time intervals. Overall, we see that larger time intervals achieve better results (seen inthe average column on the right-hand side). The best perfor-mance was achieved using a Bayes Net with τ = 30 seconds(55.63%). Notice this is a 3-way classiﬁcation problem, so,baseline classiﬁcation accuracy of a random predictor wouldbe 33%. Thus, there is deﬁnitely a signal in our dataset thatcan be used for player modeling. Experiment 2:

Table 3 provides the MSE over all skills forDataset B for machine learning only (ML), rules only (R),and with machine learning with our rules (ML+R). We notethat the random baseline has an MSE of 0.25, and the “al-ways predict 1” baseline has an MSE of 0.1383. Most ofthe classiﬁers (except Multilayer Perceptron using a 30 sec-ond time interval) were able to beat the baselines. We seethat AdaBoost and Bayes Net’s MSE was constant for alltime intervals for ML, R, and ML+R. This was due to bothclassiﬁers predicting each skill in a student’s skill vector as1.0. We also note that MSE for R remained constant over alltime intervals, implying that the time interval doesn’t inﬂu-ence our rules. This makes sense because skills are detectedfrom the telemetry information. The lowest MSE for MLand ML+R was J48 with a 30 second time interval (0.0917for ML and 0.0811 for ML+R), beating R with an MSE of0.1244. Looking closer at the MSE for each skill, we noticethat the largest difference between ML and ML+R over eachskill was 0.04 for “Place objects on the track.” With ﬁne tun-ing of our rules, we can expect better performance. achine Learning Rules

Time Interval (sec) ( τ ) AdaBoost Bagging Bayes Net J48 Multilayer Perceptron Naive Bayes Random Forest -10 0.1383 0.1124 0.1383 0.1016 0.1383 0.2146 0.1010 0.124420 0.1383 0.1346 0.1383 0.3800 0.1377 0.2043 Machine Learning + Rules

Time Interval (sec) ( τ ) AdaBoost Bagging Bayes Net J48 Multilayer Perceptron Naive Bayes Random Forest10 0.1383 0.1137 0.1383 Table 3: Experiment 2: MSE for Estimating Student Skill (Dataset B)

Machine Learning Rules

Time Interval (sec) ( τ ) AdaBoost Bagging Bayes Net J48 Multilayer Perceptron Naive Bayes Random Forest -10 0.1098 0.2121 0.2109 0.1597 0.1975 0.2856 0.2199 0.113820 0.1098 0.1766 0.2271 0.1439 0.1803 0.3037 0.1882 0.113830 0.1098 0.1900 0.2156 0.1559 0.1814 0.1920 0.1567 0.1138 Machine Learning + Rules

Time Interval (sec) ( τ ) AdaBoost Bagging Bayes Net J48 Multilayer Perceptron Naive Bayes Random Forest10 Table 4: Experiment 3: MSE for Estimating Student Skill (Dataset A)

Experiment 3:

Table 4 provides the MSE over all skills forDataset A for ML, R, and ML+R. The lowest MSE for MLwas AdaBoost with 0.1098, for ML+R was AdaBoost with0.0938, and R with 0.1138. We see that every classiﬁer forML+R and most of the classiﬁers for ML outperformed therandom baseline. However, in this dataset, none of the clas-siﬁers for ML outperformed the “always predict 1” baseline,which was 0.0895 for this dataset. For ML+R, AdaBoostwas the only classiﬁer that was close to this baseline withan MSE of 0.0938. This suggests that, for this dataset, it isbetter to just always predict that a student knows all the skillswith likelihood 1.0. The ground truth annotations were pro-vided by hand by researchers in our team, and only thoseannotations for which two separate researchers were conﬁ-dent were kept. Because of this, most annotations are for1.0 values (in our game it’s easier to assess that a studentknows something than the opposite). This makes this datasetskewed, and results on it concerning student skill predic-tion not very meaningful, compared to those from DatasetB where the ground truth was annotated directly by the stu-dents. However, we included them for completeness.

Experiment 4:

We use the values from ML+R in Table 4 tocompare against PFA. PFA could not issue predictions forall the skills, since for some of them there was no groundtruth that was either 0 or 1, and thus we could not train themodel using logistic regression. Assuming baseline 0.5 pre-dictions for those skills, PFA achieves an error of 0.0655,lower than the best results we obtained (0.0938). Over theskills in which it could make predictions, PFA achieved anMSE of just 0.0450. Recall that in order to make PFA appli-cable, we feed part of the ground truth as part of its input,which would not be available in realistic conditions. Thus,this gives us a lower bound on the MSE that we can expect to achieve if our approach perfectly identiﬁed all instances ofsuccessful and unsuccessful skill application. An immediateline of future work is to replace Equation 3 in our approachby PFA, which we expect will signiﬁcantly improve results.

Discussion:

Our experimental evaluation shows that ourproposed approach outperforms the baselines in extractinginformation that is useful for knowledge tracing via thecombination of machine learning to predict problem solvingstrategy and domain knowledge rules (which performs bet-ter than either of the two approaches in isolation). We noticethat different machine learning techniques provide better re-sults under different time intervals. This might be due to thedifference in feature vectors at the given time intervals.

Conclusions

This paper presents an approach to player knowledge trac-ing for an educational game. Our approach is based on inte-grating machine learning and domain knowledge rules thatindicate when players successfully or unsuccessfully applyskills, and using those predictions to perform knowledgetracing. Our empirical analysis with data from real usersshows that we can predict a student’s understanding of skillswith relatively low Mean-Squared Error: much lower thanthe baselines, and very close that achieved by PFA in an ide-alized situation were ground truth of the successful or un-successful application of skills was available. As part of ourfuture work, we would like to expand our domain knowl-edge rules and connect this knowledge tracing model withthe PCG approach developed in our previous work (Valls-Vargas, Zhu, and Onta˜n´on 2017).

Acknowledgements.

This project is partially supportedby Cyberlearning NSF grant 1523116. eferences [Baker, Corbett, and Aleven 2008] Baker, R. S. J. d.; Cor-bett, A. T.; and Aleven, V. 2008. More accurate studentmodeling through contextual estimation of slip and guessprobabilities in bayesian knowledge tracing. In Woolf, B. P.;A¨ımeur, E.; Nkambou, R.; and Lajoie, S., eds.,

IntelligentTutoring Systems , 406–415. Berlin, Heidelberg: SpringerBerlin Heidelberg.[Canossa 2013] Canossa, A. 2013. Meaning in gameplay:Filtering variables, deﬁning metrics, extracting features andcreating models for gameplay analysis. In

Game Analytics .Springer. 255–283.[Cen, Koedinger, and Junker 2006] Cen, H.; Koedinger, K.;and Junker, B. 2006. Learning factors analysis–a generalmethod for cognitive model evaluation and improvement.In

International Conference on Intelligent Tutoring Systems ,164–175. Springer.[Corbett and Anderson 1994] Corbett, A. T., and Anderson,J. R. 1994. Knowledge tracing: Modeling the acquisitionof procedural knowledge.

User modeling and user-adaptedinteraction

Journal of the royalstatistical society. Series B (methodological)

The little book ofsemaphores . Green Tea Press.[Eibe Frank and Witten 2016] Eibe Frank, M. A. H., andWitten, I. H. 2016.

The WEKA Workbench . Morgan Kauf-mann, fourth edition.[Gong, Beck, and Heffernan 2010] Gong, Y.; Beck, J. E.;and Heffernan, N. T. 2010. Comparing knowledge trac-ing and performance factor analysis by using multiple modelﬁtting procedures. In

International conference on intelligenttutoring systems , 35–44. Springer.[Harrison and Roberts 2012] Harrison, B., and Roberts,D. L. 2012. A review of student modeling techniques inintelligent tutoring systems. In

Eighth Artiﬁcial Intelligenceand Interactive Digital Entertainment (AIIDE) Conference .[Horn, Cooper, and Deterding 2017] Horn, B.; Cooper, S.;and Deterding, S. 2017. Adapting cognitive task analysisto elicit the skill chain of a game. In

Proceedings of the An-nual Symposium on Computer-Human Interaction in Play ,277–289. ACM.[Machado, Fantini, and Chaimowicz 2011] Machado, M. C.;Fantini, E. P.; and Chaimowicz, L. 2011. Player modeling:Towards a common taxonomy. In , 50–57. IEEE.[Onta˜n´on et al. 2017] Onta˜n´on, S.; Zhu, J.; Smith, B. K.;Char, B.; Freed, E.; Furqan, A.; Howard, M.; Nguyen, A.;Patterson, J.; and Valls-Vargas, J. 2017. Designing visualmetaphors for an educational game for parallel program-ming. In

Proceedings of the 2017 CHI Conference ExtendedAbstracts on Human Factors in Computing Systems , 2818–2824. ACM. [Pavlik Jr, Cen, and Koedinger 2009] Pavlik Jr, P. I.; Cen, H.;and Koedinger, K. R. 2009. Performance factors analysis–anew alternative to knowledge tracing.

Online Submission .[Riedl et al. 2008] Riedl, M. O.; Stern, A.; Dini, D.; and Al-derman, J. 2008. Dynamic experience management in vir-tual worlds for entertainment, education, and training.

Inter-national Transactions on Systems Science and Applications,Special Issue on Agent Based Systems for Human Learning

Intelligent tutoring systems . London: Academic Press.[Smith et al. 2011] Smith, A. M.; Lewis, C.; Hullet, K.; andSullivan, A. 2011. An inclusive view of player modeling. In

Proceedings of the 6th International Conference on Founda-tions of Digital Games . ACM Press.[Thue et al. 2007] Thue, D.; Bulitko, V.; Spetch, M.; and Wa-sylishen, E. 2007. Interactive Storytelling: A Player Mod-elling Approach.

Proceedings of the Third Artiﬁcial Intel-ligence and Interactive Digital Entertainment Conference

Associatio(July):43–48.[Valls-Vargas, Zhu, and Onta˜n´on 2017] Valls-Vargas, J.;Zhu, J.; and Onta˜n´on, S. 2017. Graph grammar-basedcontrollable generation of puzzles for a learning gameabout parallel programming. In

Proceedings of the 12thInternational Conference on the Foundations of DigitalGames , FDG ’17, 7:1–7:10. New York, NY, USA: ACM.[Van Der Werf et al. 2003] Van Der Werf, E.; Uiterwijk,J. W.; Postma, E.; and Van Den Herik, J. 2003. Local moveprediction in go. In

Computers and Games . Springer. 393–412.[Weber and Mateas 2009] Weber, B. G., and Mateas, M.2009. A data mining approach to strategy prediction.

CIG2009 - 2009 IEEE Symposium on Computational Intel-ligence and Games