Avoiding Help Avoidance: Using Interface Design Changes to Promote Unsolicited Hint Usage in an Intelligent Tutor
AAvoidingHelpAvoidance manuscript No. (will be inserted by the editor)
Avoiding Help Avoidance: Using Interface DesignChanges to Promote Unsolicited Hint Usage in anIntelligent Tutor
Mehak Maniktala · Christa Cody · Tiffany Barnes · Min Chi the date of receipt and acceptance should be inserted later
Abstract
Within intelligent tutoring systems, considerable research has in-vestigated hints, including how to generate data-driven hints, what hint con-tent to present, and when to provide hints for optimal learning outcomes. How-ever, less attention has been paid to how hints are presented. In this paper,we propose a new hint delivery mechanism called “Assertions” for providingunsolicited hints in a data-driven intelligent tutor. Assertions are partially-worked example steps designed to appear within a student workspace, and inthe same format as student-derived steps, to show students a possible subgoalleading to the solution. We hypothesized that Assertions can help addressthe well-known hint avoidance problem. In systems that only provide hintsupon request, hint avoidance results in students not receiving hints when theyare needed. Our unsolicited Assertions do not seek to improve student help-seeking, but rather seek to ensure students receive the help they need. Wecontrast Assertions with Messages, text-based, unsolicited hints that appearafter student inactivity. Our results show that Assertions significantly increaseunsolicited hint usage compared to Messages. Further, they show a signifi-cant aptitude-treatment interaction between Assertions and prior proficiency,with Assertions leading students with low prior proficiency to generate shorter(more efficient) posttest solutions faster. We also present a clustering analysisthat shows patterns of productive persistence among students with low priorknowledge when the tutor provides unsolicited help in the form of Assertions.Overall, this work provides encouraging evidence that hint presentation cansignificantly impact how students use them and using Assertions can be aneffective way to address help avoidance.
M. Maniktala · C. Cody · T. Barnes · M. ChiDepartment of Computer Science, North Carolina State University, Raleigh, North Carolina,USAE-mail: [email protected] a r X i v : . [ c s . A I] O c t Mehak Maniktala et al.
Keywords intelligent tutoring system · help avoidance · user experience · unsolicited hints · aptitude-treatment interaction · logic proofs · productivepersistence · clustering · problem solving Studies suggest that hints, when provided appropriately, can augment stu-dents’ learning experience [15, 68] and improve their performance [11]. How-ever, students may not use hints optimally [2, 31]; some abuse hints to ex-pedite problem completion, and some avoid seeking help when they are inneed [1, 65]. Our goal is to redesign the hint interface to solve this help avoid-ance problem. Considerable research has investigated hints from several per-spectives, including hint generation [10, 63], adaptive hint content [22, 41, 83],student help-seeking behavior [2,65], and hint timing [69]. However, few studieshave specifically investigated how hint interfaces could reduce help avoidance(e.g. [41, 50]).Most intelligent tutoring systems (ITSs) provide solicited hints on-demand ,i.e, upon student request [84]. Other tutors try to circumvent help avoidanceby providing unsolicited hints when the system “determines” they are needed,for example, after a long period of inactivity [32]. However, students oftenignore these unsolicited hints [22,55]. In this work, we designed a new interfacefor unsolicited hints, called
Assertions to address this issue, and comparedits impact on student learning outcomes with that of
Messages , text-basedunsolicited hints that appear after student inactivity. The ultimate goal of ourresearch is to combine the new Assertions interface with a data-driven methodto determine when providing an unsolicited hint would be most beneficial andleast disruptive for students.Our Assertions interface was designed based on user experience and mul-timedia design principles, including contiguity [50], attention [34], expecta-tion [76], and persuasion [20, 29]. First and foremost, we hypothesized thatplacing Assertions contiguously within the area of student attention wouldmake unsolicited hints more noticeable. Second, we believed students couldmore quickly interpret Assertions based on the expectation set by format-ting them like other problem-solving steps. Finally, we used persuasive lan-guage asking students to use the Assertions as problem-solving subgoals. Thesefeatures help Assertions act as partially-worked example steps, so they maygarner the same benefits of worked examples, that have been shown to im-prove learning efficiency [49]. We hypothesized that Assertions would reducehelp avoidance for all students, by increasing the percentage of times helpwas received when it was needed. Further, we hypothesize that Assertionswould have an aptitude-treatment interaction effect, fostering productive per-sistence and improving posttest performance, among students with low priorproficiency. Persistence during training that leads to mastery of a subject orpositive posttest outcomes is called productive persistence [39]. voiding Help Avoidance 3
The main contribution of this work is a principled design for a hint in-terface, Assertions, and a study to show that Assertions can be used to sig-nificantly reduce help avoidance for all students through interface alone. Ournew proposed Assertions appear as partially-worked example steps, reducingthe barriers to help usage while leveraging benefits of worked examples. Thesecond contribution of this work is a new cluster-based method that combinesposttest performance, effort (to quantify persistence), and unsolicited hint us-age to discover productive persistence. Based on these clusters, we were ableto show that students with low prior proficiency who received Assertions ex-hibit productive persistence. Since Assertions are automatically provided tostudents, they can be thought of from two perspectives: either as unsolicitedhints, or as partially worked example steps. Therefore, in our related work anddesign sections below, we discuss Assertions from both of these perspectives.
Mehak Maniktala et al.
Factory, Fossati et al. [33] devised the Procedural Knowledge Model (PKM),that uses students’ global problem-solving behaviors to generate data-drivenfeedback for the iList tutor for programming with linked lists. Price, Barnes,and colleagues extended the Hint Factory approach to generate data-drivenhints for novice programming [63, 64, 67]. Later, Paaßen et al. created thecontinuous hint factory to allow for hint generation for previously unobservedstates [60], while Price et al. devised the SourceCheck algorithm that leveragedsimilar representations to generate hints based on a set of student solutionsrather than the trace data that the original Hint Factory uses [62]. Riverset al. developed a data-driven hint generator for ITAP (Intelligent TeachingAssistant for Programming) that uses a similar set of tools including stateabstraction, path construction, and state reification to generate personalizedhints [70]. This method extends the Hint Factory by enhancing the solutionspace and creating new edges for states that are disconnected. This allows theITAP method to generate hints even for states that are not present in theprior data. For this work, we extended the Hint Factory to provide personal-ized hints for logic with 100% availability as described in Section 3.Aleven et al. have shown that students often display poor help-seekingbehaviors within intelligent tutors, including help avoidance , where students could benefit from seeking help but choose not to, and help abuse , wherestudents use help excessively when they could solve a problem without assis-tance [2]. Studies by Price et al., Almeda et al., and Roll et al. have confirmedthat help avoidance is pervasive across domains and systems with studentsignoring hints [4, 66, 71]. In one study, Roll et al. showed that meta-cognitivefeedback improved student’s help-seeking skills but did not affect their domainlearning [71]. Price et al.’s research study on help-seeking by novice program-mers showed that students have several reasons for not requesting on-demandhints, including uncertainty about whether system help would be useful, ora desire to be independent [66]. Some tutoring systems prevent help avoid-ance by providing unsolicited hints rather than relying on student help-seekingthrough “on-demand” hint requests [5, 46, 56]. Arroyo et al. [5] and Murray etal. [56] showed that unsolicited hints promoted learning gains for a subset ofstudents. However, a study by Muir and Conati showed that students oftenignore unsolicited hints [55].Several studies have tried to encourage students to use unsolicited help bychanging its content or placement. For example, Cody et al. showed that un-solicited, data-driven hints were more likely to be used if their content focusedon next-step hints rather than more abstract, high-level hints [21]. Conati etal. used eye-tracking to show that factors such as hint timing, and student’s at-titude and prior knowledge can affect students’ attention towards unsolicitedhints in a number factorization game [22]. Kardan and Conati showed thatunsolicited hints with tailored hint content along with highlighting and prox-imal hint placement improved student learning in a controlled study with AISPACE [41].Despite their potential benefits, we argue that attempting to understandor use hints, and especially unsolicited ones, can increase students’ cognitive voiding Help Avoidance 5 load while learning new concepts within a tutoring system. This is becausestudents have to mentally integrate several sources of information, includingon-demand hints, unsolicited hints, and the student’s own current solution at-tempt. Adding to this is the fact that, in many existing tutoring systems, thehints and the student solution workspace are physically located in different ar-eas of the interface. As a result, we believe that by physically integrating thosesources of information together, Assertions naturally reduce students’ work-ing memory load and thus would facilitate student learning by accelerating thechanges in their long term memory associated with schema acquisition [77,78].2.2 Worked ExamplesSince we posit that Assertions can be seen not only as unsolicited hints, butfrom another perspective as partially-worked examples for single problem-solving steps, we discuss impacts of worked examples here. Extensive researchhas shown that worked examples, i.e. showing step-by-step problem solutions,can be as effective as problem solving to learn the same content yet the for-mer generally need much less time [49, 54]. In our prior work, we have addedwhole-problem worked examples to our tutor to help students learn the prob-lem interface and problem-solving skills. In [54], we found that the studentswho received data-driven worked examples were much more likely to completethe tutor, and did so in less time [54]. In another study [72], we found thatwhen we use reinforcement learning (RL) to determine when to present whole-problem worked examples, the slow learners provided based on this RL policyhad a significantly higher learning gains than their peers who received worked-examples at random. Further, our results from study on worked examples inDeep Thought [43] show that whole-problem worked examples benefit studentsearly in the tutoring, but are comparable to hint-based scaffolding. We also ob-served that worked examples were less beneficial later in the tutoring sessionsfor lower proficiency students. Our work with Pyrenees, a probability tutor,suggests that step-level Worked Examples can also promote learning [92]. Thiswork suggests that students do not resist following these step-level worked ex-amples, that are essentially unsolicited hints provided in student workspace.One mechanism proposed by Sweller et al. for the success of worked ex-amples is through reduction in the cognitive load when students are learningnew concepts [79]. Their work discusses the principles underlying cognitiveload theory and how worked examples reduce the need for learners to en-gage in inference processes which might otherwise require heavy demands onstudents’ working memory. On the other hand, much prior work found thatasking students to justify their solution steps, referred to as self-explanations,can greatly improve their learning [3, 19, 24]. Furthermore, asking students toexplain expert-designed worked examples can be more effective than problemsolving alone [18,88]. For example, Weerasinghe and Mitrovic explored the im-pact of self-explanations in KERMIT-SE, a tutor for the open-ended domainof database design. They engaged students in tutorial dialogues upon errors in
Mehak Maniktala et al. solutions and found that it improved student performance in both conceptualand procedural knowledge [88, 89]. In this work, we design our new Assertionshint interface to act as expert-designed partially-worked example steps withself-explanations. However, there are two key differences between our workand that by Weerasinghe and Mitrovic: Assertions are provided to guide stu-dents on the next step instead of the current step, and they are provided aftercorrect steps instead of incorrect steps. As described in section 3, Assertionsprovide students with the content of a useful step, but students must providean explanation before they can use the hint content in their solutions.2.3 Aptitude-Treatment InteractionPrior research in instructional strategies has shown the existence of aptitude-treatment interaction (ATI), where certain students are more sensitive to vari-ations in the learning environment compared to less sensitive students whoperform regardless of the treatment [25, 74]. Researchers have explored thecomplex relationship between student aptitude and their interaction with un-solicited help. While Razzaq et al. found that students learned more reliablywith hints they requested than unsolicited hints [69], Arroyo et al. observedhigher learning gains for low performing students when unsolicited hints wereprovided [5]. Further, Murray et al. found that unsolicited help avoided thenegative effects of frustration and saved students time when they were strug-gling [56]. Muir and Conati showed that students with low prior knowledge arelikely to need hints the most, but they do not look at the hints as often [55].Kardan and Conati found that changes in unsolicited hint content and inter-face had a more pronounced effect on learning for students with lower initialknowledge [41]. Similar to these studies, we hypothesize that an improved in-terface for unsolicited hints can increase hint usage and outcomes, especiallyfor students with low prior knowledge. In this work, we believe that studentswhose initial tutor performance is lower may need more assistance to developstrategies for solving logic proofs, and therefore, may benefit more from animprovement in the hint interface.2.4 Productive PersistenceRecently, there is an increased interest in non-cognitive skills like persistenceand self-control within education research [39]. Task persistence is definedas the continuation of a task despite difficulty. To quantify persistence, re-searchers used metrics of effort [28]. However, not all persistence is produc-tive, Beck and Gong [12] define unproductive persistence or “wheel spinning”as when a student spends an excessively long time struggling to learn a topicwithout achieving mastery. They showed that if a student did not master a skillin ASSISTments (an online math learning platform) or the Cognitive AlgebraTutor in a reasonable amount of time, the student was likely to struggle and voiding Help Avoidance 7
Fig. 1: Tutor’s Interface: Student workspace (left), rules (middle), info box(right), and the
Hint button and message box (bottom-left)never master the skill. Their work presents connections between wheel-spinningand negative student behaviors such as disengagement and gaming, as well asrecommendations to improve ITS design to address these issues. Research byNelson et al. is well-known for their heuristic model of the help-seeking processwhere they suggest that unproductive persistence may be associated with helpavoidance [58]. Studies suggest that the persistent effort that lead to masteryof a topic is productive persistence [39], and is often associated with short-termoutcomes like improvement in performance [13,61], and longer-term outcomesin higher education and future earning [27, 35]. Recent studies in educationaldata mining have attempted to predict when an intervention can help stu-dents by distinguishing between productive and unproductive behavior usingdecision trees [39] and Recurrent Neural Networks (RNN) [14]. The work byKai et al. on ASSISTments used decision trees to identify when students arestruggling and how to make students’ persistence more productive. They foundthat interleaved practice of different skills is more advantageous than blockedpractice, where the opportunities to learn a given skill are massed one afteranother. Another study on ASSISTments by Botelho et al. used RNNs to de-tect stopout (low persistence) and wheel-spinning (unproductive persistence)early to intervene and prevent unproductivity. They found that these modelshave high AUC and are also able to learn a set of features that generalize topredict each other. In this paper, we apply clustering to discover patterns ofproductivity, persistence, and unsolicited hint usage in our tutor.
Mehak Maniktala et al.
Fig. 2: A sample solution of a training problem in Deep Thought
Deep Thought (DT, Figure 1) is an intelligent tutor for solving open-endedmulti-step propositional logic problems that has data-driven features includingnext-step hints [8, 75], as well as adaptive problem selection [51, 53] and ped-agogical policies for worked example presentation induced via reinforcementlearning [6, 52, 72, 73]. Figure 1 shows the current tutor interface: the left win-dow is the workspace where students construct solutions, the central windowlists the domain rule buttons, and the right window provides instructions andinformation such as the rules that are meant to be practiced in the currentproblem. Each problem-solving statement is graphically represented as a node .Deep Thought shows several problem-provided statements (that are meant tobe used as existing or known facts) at the top of the workspace, and a conclu-sion to derive at the bottom. Students iteratively carry out problem-solvingsteps by deriving new statements from old ones using domain rules. This is atypical procedure used across STEM domains to apply principles or rules toknown information to derive new facts [59]. For example, in physics, if we knowvalues for mass ( m ) and acceleration ( a ), we can apply the rule F = ma withthose values to find force ( F ). In this paper, a problem-solving step consists ofa new derived statement and its justification , where the justification includesspecifying the domain rule and the source statements used to show that thenew derived statement is true. In logic, problem-solving continues until theconclusion is the derived statement in a step that is justified.Figure 1 shows an example problem with three nodes 1-4 for the problem-provided statements (2: B , 1: A → C , 3: C → E ), and 4: D ∧ ¬ E at the topof the workspace. The conclusion to be derived (C: ¬ A ∧ B ) is at the bottom, voiding Help Avoidance 9 with a question mark indicating that it is not yet justified. Each problemsolving step involves the same process: clicking on 1-2 source nodes and a rulebutton, and entering the new derived statement. The tutor verifies whetherthe source nodes and rule correctly justify the derived statement. Once a stepis verified, a new node appears, colored based on how often the same node wasnecessary in previous student solutions to this problem, where green meansfrequent, yellow is infrequent, and gray is never. We call a node ‘necessary’ or‘needed’ when its deletion would make a solution incomplete. These coloringsgive students an indication of whether they are on an optimal problem-solvingpath.We now walk through the student experience of solving the problem shownin 1 to obtain the solution shown in Figure 2. First, the student clicks on node4 and rule Simp, and is asked to type the new derived statement, D . The tutorverifies that Simp applied to node 4 is a correct justification, and draws node5, labeled with Simp and an arrow from node 4 to 5. Node 5 is colored graysince it was never needed by previous students solving that same particularproblem. Next, the student applies the same process to derive and justify node6, which is green since it was frequently necessary in historical solutions. Toderive node 7, the student clicks on node 1, and Impl rule, and types in thederived statement ¬ A ∨ C . After it is verified, node 7 appears, with the labelImpl, and an arrow from node 1 to 7. The student then clicks “Get Hint” torequest a hint, and “Try to derive ¬ C ” appears in the message box. Next, thestudent tries to follow the hint by selecting nodes 3 and 6 and the rule MP.The tutor detects this incorrect rule application, records the error in the datalog, and provides an error-specific message, but since it was a mistake, no newnode is created. Since nodes 3 and 6 are still selected, the student clicks onthe correct rule – MT, and types in the derived statement ¬ C . This processcorrectly justified the hint content statement ¬ C , so node 8 appears with MTwith arrows from nodes 3 and 6. The student similarly clicks on nodes 7 and8, and rule DS to derive node 9. Finally, the student clicks on nodes 2 and 9,and rule Conj to derive the conclusion, and the tutor detects that the problemis complete.3.1 Hints in Deep ThoughtDeep Thought uses the Hint Factory [75] to generate hints, where the hintcontent depends only on the current problem solving state, a snapshot of astudent problem-solving attempt. The Hint Factory [75] works by treatingproblem-solving data from prior students as a Markov Decision Process andusing value iteration to assign values to each state based on its distance froma valid observed solution. Then, the hint source is set for a current student’sstate by selecting the subsequent reachable state with the highest value. If thecurrent state is not found, we rollback current student solution states until amatching state and its hint source are found. Finally, the Hint Factory-derived hint content is the newest derived statement in the hint source state. Deep Fig. 3: Differences between Assertions and Messages while delivering a logichint statement A → E (a) Assertion is presented in the workspace,with the format of a student-derived step,and with a “Subgoal” label (b)
Message hint is provided textually belowthe student workspace
Thought inserts this derived statement, the hint content HC , into a templatedepending on the hint type, described below.In this study, there are three types of hints, including on-demand hintrequests, and two types of unsolicited hints: Messages and Assertions. Thecontent of on-demand and unsolicited hints is identical and no additional jus-tification/derivation help is given. Students request on-demand hints by click-ing the “Get Hint” button, and the system shows “Try to derive HC ,” inthe message box. Both Messages and Assertions are unsolicited hints, mean-ing that they are not requested by students. Messages appear automaticallyafter one minute of student inactivity, using the same Messages interface ason-demand hints.
Assertions appear automatically after about 40% of steps.Since the mean student solution length in the training problems is 9 steps,this means that students are likely to encounter 3 - 4 hints per problem. TheAssertions interface consists of 4 parts: (1) adding a new cyan-colored nodecontaining the hint content HC in the workspace, (2) labeling the node as a“Subgoal,”, (3) including a question mark icon showing that the node is notyet justified, and (4) stating “Try to justify the added goal” in the messagebox. Figure 3 shows the Messages and Assertions interfaces suggesting thesame logic statement A → E in different formats. Students must explain howthe node is to be derived by justifying it before they can use the hint contentin their solutions. While this is not a typical verbal self-explanation, we ar-gue that, by justifying the step, the student is demonstrating that they knowwhat domain principle (rule) and prior statements can be used to explain whythe new derived statement is true. In the next section, we describe the designprinciples used to create the new Assertions interface. voiding Help Avoidance 11 • Contiguity and Attention : Moreno, et al.’s spatial contiguity principlefor multimedia learning materials states that a graphic should not be phys-ically separated from its explanatory text [26, 50]. Hegarty et al. showedthat contiguity supports student memory and understanding [36]. Butcherand Aleven showed that when interactive support was placed near a ge-ometry diagram, student learning outcomes improved [16, 17]. Kardan andConati in a controlled study on AI SPACE tailored the hint content, usedhint highlighting and proximal hint placement to gain students attentiontowards unsolicited hints that improved learning for low prior knowledgestudents [41]. In this work we use similar proximal hint placement for As-sertions but provide the same content in both Assertions and Messages.We strategically place Assertion hints where the student needs them. Al-though the message box is close to the workspace, it may still be subjectto ‘change blindness’ [34], where students paying attention to nodes withinthe workspace may filter Messages out and simply not notice their appear-ance. Therefore, we provide Assertions in the workspace, where studentshave already focused their attention. Together, contiguity and attentionare meant to help students notice the appearance of Assertion hints. • Expectation : Research by Summerfield explains that the speed of visualinterpretation is optimized by leveraging past experiences to form expecta-tions [76]. Based on this principle, we design Assertions to leverage studentexpectations through an isomorphic visual format that may work togetherwith reduced text to decrease cognitive load. First, the hint content HC of Assertions appears in the same visual node format as student-derivedstatements, enabling students to visually interpret an Assertion hint faster.Second, Messages require students to read the text “Try to derive HC ” anddetermine that HC is a statement that should appear on a graphical node.This additional cognitive processing may pose a barrier that some studentsmay not overcome [80], and this may be especially true for students withlow prior knowledge [40]. Therefore, formatting the Assertions hint con- tent HC as nodes may help students by leveraging visual expectation, orby reducing overall cognitive load [80]. • Persuasion : Dillard suggests that user experiences can be enhanced byusing persuasion [29]. Cialdini has created six principles of influence, in-cluding reciprocity, commitment and consistency, liking, social proof, au-thority, and scarcity, that can be used to influence people’s behaviors [20].Assertions have two persuasive design aspects. First, we posit that addingAssertions directly to the workspace may make them seem required, lever-aging the authority of the tutoring system itself. Assertion nodes are ac-companied with a label “subgoal” (Figure 3a) and the message “Try tojustify the added goal”, persuasive and authoritative texts suggesting thatjustifying Assertions is just part of the tutor. The difference in the text ac-companying Assertions and Messages is that an Assertion is called a “goal”but message hints do use that terminology while providing hints. Second,Assertion nodes are also formatted with a question mark like the conclu-sion. Formatting leverages both the visual expectation principle above, butalso Cialdini’s consistency notion that people prefer to be consistent. Oncethey get used to following tutor instructions and justifying nodes that havequestion marks, Assertions can rely on people’s natural consistency thatinfluences them to continue to make similar consistent choices. Previousstudies on help-seeking and hint usage suggest that students have manydifferent reasons for help avoidance, including their attitudes towards hintsand their preference for autonomy [65]. Persuasive design elements maycircumvent these preferences by simply influencing students to do what issuggested.
Based on our foundational design principles and literature review, we proposethe following three hypotheses: (H1) Assertions will increase the unsolicitedhint usage for all students irrespective of their prior knowledge. (H2) Assertionswill lead students with low prior knowledge to form shorter proofs faster inthe posttest. (H3) Assertions will foster productive persistence among studentswith low prior knowledge.4.1 ParticipantsThe study was conducted with 122 participants at North Carolina State Uni-versity, the top engineering university in the state, where Deep Thought wasgiven as a homework assignment to a class of 312 undergraduate students inthe College of Engineering majoring in Computer Science, Computer Engi-neering, or Electrical Engineering in a Fall 2018 discrete mathematics course.We do not have specific demographics of study participants, but the Fall2018 College of Engineering demographics include 25.3% women, 67.2% white, voiding Help Avoidance 13 .4.2 ConditionsWe used stratified sampling to split students based on their pretest perfor-mance, and then randomly assigned them to the conditions with Assertions as the treatment, and
Messages as the control. The condition assignment re-sulted in N = 73 in Assertions, and N = 49 in Messages. The total numberof participants who completed the study was 105 (61 in Assertions, 44 inMessages) but after removing logs with system errors, the dataset had 100students with 57 in Assertions, and 43 in Messages. We performed a χ testof independence to examine the impact of completion rate and system er-rors on the groups and found no significant differences among the two groups: χ (2 , N = 122) = 1 . , p = 0 .
91. This implies that the group sizes were notsignificantly impacted by the tutor completion rate or logging errors.4.3 ProcedureThe student procedure is as follows: The tutor provides students with prac-tice solving logic problems, divided into four sections: introduction, pretest,training, and posttest. The introduction presents two worked examples to fa-miliarize students with the tutor interface. Next, students solve two problemsin a pretest , which is used to determine students’ incoming competence. Stu-dents are assigned a condition based on their pretest performance. The pretestproblems are designed to be easy and short, using a few straightforward rules,and this is reflected in their short optimal solution lengths (
Mean = 3 . SD = 0 . training section withfive training levels with gradually increasing difficulty, and this is reflected inthe average length of optimal solutions during training, with a mean optimalsolution length of Mean = 4 .
99 steps, ( SD = 1 . four training problems. Students may skip a maximum of threeproblems per level, with each skip taking students to easier problems. Studentsmay also restart problems using the “Restart” button below the workspace. Inboth conditions, students in the training levels may request on-demand hintsand always receive immediate feedback on rule application errors (see section3). Students in the Messages (control) condition received unsolicited messagehints upon one minute of inactivity. Students in the Assertions (treatment)condition were given Assertions after about 40% of their steps.The algorithmwe use to provide Assertions uses two steps. In the first step, we decide at Fig. 4: Example scenarios of Assertion hint A → E usage usage (a) The Assertion A → E node appearsin the student workspace; if it is neverjustified, it remains as-is (b) (The student has justified the hint by se-lecting nodes 1 and 3 and rule HS(c) A student solution where hint A → E wasjustified but not needed (d) Another student solution where the hintwas both justified and needed random whether the step should get a hint with 50% probability. In the sec-ond step, we check for the constraints that assertion should not be given inmore than two consecutive steps. This resulted in an actual assertion provisionrate of 40%. Note that both Messages and Assertions remain on the screenuntil a student justifies them . Further, only one unsolicited hint, regardless ofinterface, may be present at a time, and the hint content is not updated basedon new student work. Finally, students take a more difficult posttest with fourproblems, with longer optimal solution lengths compared to the other sections( Mean = 7 . SD = 1 . The tutor allows students to delete assertions but only two Assertions were deleted inthe entire dataset, suggesting that students did not realize this was possiblevoiding Help Avoidance 15 justified when a student applies rules to existing state-ments to derive it. Figure 4a shows an Assertion suggesting A → E . When astudent selects nodes 1 and 3 and the rule HS to derive the hint A → E , theAssertion hint A → E is said to be justified, and it becomes a numbered node5 as in Figure 4b. The student may continue to solve the problem as in Figure4c, without ever having used node 5 to justify any other node. As in this case,whenever an Assertion was justified but could be deleted without making thesolution incomplete, we say the Assertion was justified but not needed . Anotherstudent may solve the problem as in Figure 4d where the same hint statement A → E is both justified and needed. If we remove node 5 from the solution,it becomes incomplete since nodes 7: ¬ A and C: ¬ a ∧ B could not be derivedwithout it.We assume that if students justify a hint, they have paid attention to it.The Hint Justification Rate (HJR) is defined as hints justified divided by thetotal given across the training problems. As in other multi-step open-endedproblem domains, students may derive several statements that are not neededto solve a problem, making the solution longer than necessary. For a hint to becalled needed , students must first justify it, but must also figure out how theycan use it to derive the conclusion.
Hint Needed Rate (HNR) is defined as hintsneeded divided by the total number of hints given across the training problems.We use unsolicited HJR to evaluate student attention towards unsolicited help,and unsolicited HNR to measure the influence of unsolicited hints on studentproblem solving.4.5 Performance MeasuresOur test performance measures include: solution length optimality, problem-solving time, and rule application accuracy. In open-ended domains, solutionlength , i.e., the number of derived statements in a complete solution, is a valu-able performance metric as there is a vast diversity of possible student solu-tion paths. Our aim with increasing unsolicited hint usage is to guide studentsto learn efficient problem-solving strategies from incorporating the partiallyworked example Assertion steps as necessary statements in their solutions.Since the posttest consists of four problems, we evaluate students based ontheir average solution length in the posttest, and shorter lengths are better . Note that solution length can only be calculated for complete solutions, and our dataconsists only of students who successfully completed the study by completing the mandatorypre- and post-test problems. N = 5 (10%) in Messages, and N = 12 (16%) in Assertions did6 Mehak Maniktala et al. Problem solving time is also an important performance metric in open-ended domains. Similar to other studies [41, 81], we also assess students onthe total time they spend solving problems. In order to account for outliers,while calculating problem solving time, we cap each click-based interactiontime to five minutes, i.e., if a student took more than five minutes to performan interaction, we cap it to five . A shorter problem solving time suggestsbetter performance. We hypothesized that an increased usage of unsolicitedhints, will help students learn to solve problems more quickly and with shortersolution lengths, and that these effects will be more pronounced for studentswith low prior knowledge.Finally, Accuracy is defined as the number of correct rule applicationsdivided by the total number of applications. A higher accuracy value suggestsbetter knowledge of how to apply domain rules. Since the tutor is designed toprovide immediate feedback on incorrect rule applications without penalties,even within the pre- and post-tests (see section 3), we do not hypothesizedifferences in the accuracy between the two conditions. We report accuracyfor both conditions, however, for completeness.4.6 Prior ProficiencyWe hypothesize that an increase in the unsolicited hint usage significantlyimpacts the performance of students with low prior knowledge. Our priorwork [72] suggests that students with different incoming competencies canexperience a treatment differently. To account for such aptitude-treatment in-teraction effects, we quantify prior knowledge by splitting the students intoLow and High
Prior Proficiency groups using a normalized pretest perfor-mance score that combines the number of problem-solving steps, the aver-age time spent on each step, and accuracy. The three performance measuresare normalized separately and equally weighted in a combined score thatis again normalized. Students with pretest performance > χ (1 , N = 122) = 0 . , p = 0 . effort is not finish the tutor. A chi-square test shows no significant difference in the completion andnon-completion group sizes between the two conditions ( χ (1 , N = 122) = 0 . , p = 0 . The 99 th percentile of interaction action time in Fall 2018 was 99.03s; 811 out of 260,750interaction logs for 100 students in the study, had an action time greater than 5minvoiding Help Avoidance 17 highly motivated by prior research. More specifically, Venture et al. defineda metric for students’ efforts as the amount of time spent on unsolved prob-lems and they found that there was a significant correlation between the effortmeasured during the training and a self-report measure of persistence [86].Later, in another study, they used this effort metric to measure persistencein an educational game that teaches Qualitative Physics [85]. In our tutor,students can skip up to three problems per training level and thus we alsomeasure the time students spent in these unsolved skipped problems as ameasure of effort. Moreover, Dumdumaya et al. defined their effort metric asthe number of reattempts made on a problem after a failed attempt predictedtask persistence [30]. In our tutor, this corresponds to the number of restartson problems that students eventually solve. In the following, we separatelytrack effort through two research-based measures: (1) time spent on unsolved(skipped) training problems, and (2) the number of restarts on solved trainingproblems. For the purpose of this analysis, we define productive persistence aspersistent (high) efforts that result in higher posttest performance. After cleaning the data as described in Section 4.2, an average of 2,483 interac-tions were logged and analyzed per student in our final sample of 100 students(with 57 in the Assertions condition, and 43 in Messages). We partitioned thestudents based on Prior Proficiency into Low (n = 41) and High (n = 59)groups. We then partitioned by Condition and Prior Proficiency resulting in4 groups: Assertions-Low (n = 25), Assertions-High (n = 32), Messages-Low(n = 16), and Messages-High (n = 27).Before investigating any of our hypotheses, we first compared the numberof on-demand hints between the two conditions to ensure that any differencesbetween groups could not be explained by differences in on-demand hint re-quests. Similar to other tutors [47, 65], students in this study rarely requeston-demand help irrespective of Condition or Prior Proficiency. We found nosignificant differences in the number of on-demand hint requests between con-ditions or by prior proficiency, with all conditions requesting, on average, lessthan one on-demand hint per problem. Students in the Assertions conditionrequested few on-demand hints per problem, with Mean = 0.79 , SD = 3.92(Assertions-Low group: Mean = 0.67, SD = 2.82, and Assertions-High group:Mean = 0.89, SD = 3.07). Students in the Messages condition similarly re-quested few on-demand hints per problem for the Messages group, with Mean= 0.55 , SD = 2.72 (Messages-Low: Mean = 0.46, SD = 2.36, and Messages-High: Mean = 0.59, SD = 2.43). The on-demand hint data was not normallydistributed as tested by the Shapiro-Wilk’s test (Assertion: W = 0.744, p < W = 0.752, p < { Assertion,Messages } and Prior Proficiency { Low, High } on the number of on-demandhints shows no significant main effects (Condition: F (1,100) = 0.132, p = Table 1: Comparison of unsolicited hint metrics between the two conditions,where a two-way Aligned Ranks Transformation ANOVA for each metric showsonly a main effect of Condition ( p < Unsolicited Hint Metric Assertions MessagesHints Given in Training 48.82 (9.85)* 32.74 (10.64)Hint Justification Rate (HJR) 0.93 (0.07)* 0.63 (0.18)Hint Needed Rate (HNR) 0.82 (0.09)* 0.62 (0.17) F (1,100) = 1.075, p = 0.302) or interaction ( F (1,100)= 0.006, p = 0.940). Based on this analysis, the remaining analyses focus onlyon usage for unsolicited Assertion and Message hints.5.1 H1 : Assertions increase the unsolicited hint usage for all studentsirrespective of their prior knowledgeTable 1 shows the unsolicited hint metrics: . Since we have hint data that is not normally distributed ,we performed a two-way Aligned Ranks Transformation ANOVA [90] on eachof the unsolicited hint metrics with the two factors as Condition { Assertions,Messages } , and Prior Proficiency { Low, High } .We applied a two-way Aligned Ranks Transformation ANOVA on the un-solicited hint metrics of Condition ( F (1,100) =40.26, p < Condition (HJR: F (1,100)= 191.10, p < F (1,100) = 62.30, p < F (1,100) = 0.008, p = 0.929, HJR: F (1,100) = 0.221, p = 0.639, and HNR: F (1,100) = 0.009, p = 0.924. The distribution parameters for each of the unsolicited hint metricsper Prior Proficiency group are provided in Appendix A. There was only a HJR and HNR are the proportion of hints justified and needed respectively Shapiro-Wilk’s test on
Unsolicited Hints Given for the Assertions group: W = 0.904, p < W = 0.942, p = 0.030; Shapiro-Wilk’s test on UnsolicitedHJR for the Assertions group: W = 0.887, p < W = 0.959, p < Unsolicited HNR for the Assertions group: W = 0.904, p < W =0.945, p < Table 2: Comparison of
Posttest Performance metrics between the twoconditions within each
Prior Proficiency group - Average Solution Length( p = 0.033) and Total time ( p = 0.008) are significantly different between theAssertions-Low and the Messages-Low groups PriorProfi-ciency Avg. Sol. Length ( main effect of the Condition as shown above. This shows that the Assertionshad a significant impact on unsolicited hint usage for all students, regardlessof incoming proficiency, confirming hypothesis H1.5.2 H2 : Assertions will lead students with low prior knowledge to formshorter proofs faster in the posttestSince all performance data were normal, we performed t-tests to compare con-ditions. A t-test on the average pretest solution length between the Assertions(Mean = 7.54 nodes, SD = 1.87 nodes) and the Messages (Mean = 7.64, SD =2.21) conditions, showing no significant difference ( t (99) = 0.791, p = 0.215).We also observed insignificant differences in the pretest problem-solving time( t (99) = 0.683, p = 0.248) using a t-test between the Assertions (Mean =27.16 min, SD = 8.29 min) and the Messages (Mean = 25.89 min, SD = 10.23min) conditions. While the H2 hypothesis does not predict differences in ac-curacy between conditions, students were assigned a condition based on theirpretest performance, which includes rule application accuracy, so we compareit here. A t-test on the pretest rule accuracy between the Assertions (Mean =0.52, SD = 0.16) and Messages (Mean = 0.52, SD = 0.14) conditions showsno significant difference ( t (99) = 0.111, p = 0.455).As mentioned earlier, hypothesis H2 is based on the reasoning that Asser-tions may guide students towards optimal strategies, which can lead studentswith low prior proficiency to form shorter solutions in less time. We exam-ined the correlation between the dependent variables (average solution lengthand total time) to assess their overlap, both for the entire population and forthe low prior proficiency group. We did not observe a significant correlationbetween the average posttest solution length and posttest time for the entirepopulation: Corr = 0.050, p = 0.615 or for the Low Prior Proficiency group: Corr = 0.015, p = 0.916.Table 2 shows the posttest performance of the two Conditions { Assertions,Messages } disaggregated for the Low , and
High
Prior Proficiency groups inthe first two rows, and for
All students as a summary in the bottom row. To
Fig. 5: Tukey’s HSD shows that the Assertions-Low performed significantlybetter in posttest than the Message-Low group in average solution length ( p = 0.033) and total time ( p = 0.008)investigate our H2 hypothesis, we performed a two-way ANCOVA on averagesolution length and total time, with the Condition { Assertions, Messages } andPrior Proficiency { Low, High } as the two factors, and the respective pretestperformance metric as the covariate. For average solution length, we observed asignificant interaction between the Condition and Prior Proficiency ( F (1,100)= 4.983, p = 0.027). Neither main effect for Condition or Prior Proficiencywere significant. We then performed the pairwise Tukey’s Honest Significant(HSD) test for multiple comparisons and found a significant difference ( p =0.033) between the Assertions-Low and
Messages-Low groups, showing thatthe Assertions-Low group formed significantly shorter proofs on the posttestthan the Messages-Low group.A two-factor ANCOVA on posttest total time as described above shows asignificant interaction between the Condition and Prior Proficiency ( F (1,100)= 6.236, p = 0.014), and a significant main effect of the Condition ( F (1,100)= 6.913 p = 0.010). The main effect of Prior Proficiency was not significant.A pairwise Tukey’s Honest Significant (HSD) test for multiple comparisonson the total posttest time shows a significant difference ( p = 0.008) between Assertions-Low and
Messages-Low groups. The Assertions-Low group spentsignificantly less time on the posttest than the Messages-Low group. Figure5 summarizes the differences between the Assertions-Low and Messages-Lowgroups in their posttest performance. Together with the results above, theAssertions-Low group had significantly better posttest solution length andtime than the Messages-Low group, confirming our H2 aptitude-treatmentinteraction hypothesis for posttest performance. While we did not hypothesizeimprovements in posttest accuracy, we provide these results in Appendix Bfor completeness.Next, we investigated the correlation between average posttest solutionlength with the unsolicited hint metrics. First, the top row of Table 3 shows voiding Help Avoidance 21
Table 3: Correlation between average posttest solution length and unso-licited hint metrics for the entire population, and split by low and high priorproficiency groups
PosttestSolution Lengthwith EntirePopulationN = 100 Low PriorProficiencyN = 41 High PriorProficiencyN = 59Corr p Corr p Corr p < < Table 4: Correlation between total posttest time and unsolicited hint metricsfor the entire population, and split by low and high prior proficiency groups
PosttestTotal Timewith EntirePopulationN = 100 Low PriorProficiencyN = 41 High PriorProficiencyN = 59Corr p Corr p Corr p < < < < that the number of unsolicited hints given does not correlate to posttest so-lution length, suggesting that differences in posttest solution lengths betweenconditions cannot be attributed to the frequency of unsolicited hints. However,both HJR (second row) and HNR (third row) are significantly and negativelycorrelated to posttest solution length for students with Low Prior Proficiency(HJR: p < p < . This suggests that students with low priorknowledge learn more from the hints needed, rather than the ones they onlyjustified (see Figure 4 differentiating hints justified and needed). A justified,but not needed, hint suggests that a student could determine how to derivethe unsolicited hint content, but not how to use it . It is reasonable that lowerprior proficiency students who were able to include the unsolicited hints asnecessary components of their proof solutions were more likely to learn moreoptimal, shorter problem-solving strategies. We also observed an insignificantbut positive correlation between average posttest solution length and HNRfor the High prior knowledge group. While small and not significant, this in-verted effect may indicate another aspect of aptitude treatment interaction,where high prior proficiency students may potentially learn less if they taketoo much advantage of unsolicited hints. This result suggests that it may be We did not test for the significance in the difference between the two correlation coeffi-cients because the samples are not independent. Hints Needed are a subset of Hints Justified2 Mehak Maniktala et al. preferable to build a more adaptive method to determine when to presentunsolicited hints to students with high prior proficiency.Table 4 shows the correlation between posttest time and the unsolicitedhint metrics. First, Table 4 shows that the number of unsolicited hints given(top row) does not correlate to posttest time, suggesting that differences inposttest time between conditions cannot be attributed to the frequency of un-solicited hints. While HJR (second row) and HNR (third row) are significantlycorrelated to the posttest time for the entire population, the Pearson’s Corre-lation Coefficient is less than 0.3, suggesting small coverage. However, studentswith Low Prior Proficiency have a significant correlation (that is also greaterthan 0.3) between posttest time and unsolicited hint usage metrics HJR andHNR.Table 1 shows that, over the entire population, significantly more ( p < Condition on thenumber of unsolicited hints given. Neither the main effect of Prior Proficiencynor the interaction effect were significant. It would be reasonable to expectthat the frequency of hints might impact posttest performance. However, ourcorrelation analysis shows that the significantly higher number of unsolicitedhints given in the Assertions condition did not correlate with posttest per-formance for either solution length or time. Instead, the significant negativecorrelations between posttest length and time, and Hints Needed Rate for allstudents with low prior knowledge suggests that students in the Low grouplearned from using the unsolicited hints to achieve problem conclusions. Theseneeded hints provided insight into efficient problem solving, by showing stu-dents optimal problem-solving steps. As shown in Table 1 above, students inthe Assertions condition had higher HNR than students in the Messages con-dition. Therefore, our results confirm hypothesis H2 that there would be anaptitude-treatment interaction effect where Assertions helped students withlow prior proficiency learn to construct more optimal (shorter) solutions morequickly on the posttest.5.3 H3 : Assertions foster productive persistence among students with lowprior knowledgeWe hypothesized that increased usage of unsolicited hints in the form of Asser-tions will lead students with low prior proficiency to exert persistent effort intraining, and this persistence will be productive (i.e., improved posttest per-formance). We clustered students on five features including: two productivitymeasures (posttest solution length and time, where lower is better), two effortmeasures including time spent on unsolved (skipped) problems and the num-ber of restarts, and unsolicited hint usage as measured by HJR. We used HintJustification Rate (HJR) instead of Hint Needed Rate (HNR) since the hintsneeded cannot be determined for unsolved problems. The clustering analysis voiding Help Avoidance 23 Table 5: Selecting the number of clusters based on three cluster quality indices
Table 6: Centroids (Mean) of the three clusters using Hierarchical Clusteringwith the Ward’s method
Clus-terNo. ClusterLabel Posttest TrainingTotalTime(min) Avg.Sol.Length( provides a deeper understanding of student behavior patterns involving pro-ductivity, effort, and proactive hint usage. An ANOVA on the effort metricswould not have helped us understand how student effort varies in tandemwith both productivity and hint usage. Therefore, the cluster analysis is moregeared towards answering H3 than an ANOVA.We performed cluster analysis using Hierarchical clustering with Ward’smethod on standardized features. We selected the number of clusters usingmajority vote across three indices: Silhouette and Calinski-Harabasz, whichboth maximize inter-cluster similarity and minimize intra-cluster similarity(overall higher values are better), and the Davies-Bouldin Index, which prefersminimal intra-cluster similarity (overall lower values are better). Table 5 showsthat using three clusters yields the best quality clusters.Table 6 shows the centroids of the three clusters. We used the class av-erage (CA) , i.e., average over the entire population to assess the clusters oneach measure. The following order was observed for each feature used in theclustering analysis: (Note that lower posttest time and solution lengths arebetter) • Posttest Time (min) : < < CA (42.81) < • Posttest Sol. Length : < < CA (14.01) < • Unsolved Problem Time (min) : > CA (5.16) > > • Restarts : > CA (2.44) > > • Hint Justification Rate (HJR) : > > CA (0.80) > Fig. 6: Profile of the three Clusters based on the Condition and Prior Profi-ciencyfive features better than the class average). In the following, we refer to thiscluster as
Productive - High Effort- High HJR . Cluster
Productive - Low Effort - High HJR .Interestingly, a lot of the High Prior Proficiency students ended up in loweffort but did no better than Assertions-Low group on the posttest. Lastly,cluster
Unproductive - LowEffort- Low HJR .We then profiled each cluster based on the pairs of the Condition and PriorProficiency as shown in Figure 6. Interestingly, the majority of the Assertions-Low group students are in the
Productive - High Effort- High HJR cluster, andthe majority of the Messages-Low group students are in the
Unproductive - LowEffort- Low HJR cluster. Most of the students in the Assertions-High and theMessages-High groups are in the
Productive - Low Effort- High HJR cluster.Since we are interested in the Low Prior Proficiency group, we performed a chi-square test to compare the distribution of the Assertions-Low and Messages-Low students in the three clusters and found a significant difference ( χ (1 , N =41) = 24 . , p < . Productive - High Effort- High HJR clusterwith the highest effort and unsolicited hint usage in training with productiveposttest results, and this confirms our H3 hypothesis. voiding Help Avoidance 25 H1 : Assertions increase the unsolicited hint usage for all studentsirrespective of their prior knowledgeThe hints in our tutor suggest the most optimal next-step statement to derivefor any given student problem-solving state. Similar to other tutors [47,65], inthis study, we found that students rarely request on-demand help irrespectiveof condition. However, our results suggest that the difference in unsolicitedhint usage between Messages and Assertions can be attributed to presentationalone. We found that Assertions, specifically designed using the principles ofcontiguity, attention, expectation, and persuasion, significantly increased boththe attention students pay to unsolicited hints (HJR), and their influence onstudents’ solutions (HNR) regardless of the students’ prior knowledge. Conatiand Manske suggested in [23] that students pay more attention to simplerhints. Assertions provide high immediacy (making the hint content immedi-ately usable, [7]) since they leverage both spatial contiguity [50] by placinginformation right where it is needed and visual expectation [76] by format-ting hints to make them more intuitive to follow. Studies have also foundstudents’ attitude towards unsolicited hints to be an important factor in helpavoidance [22, 65]. Persuasive factors like increasing perceived authority [20]through formatting and language can make justifying Assertions seem to berequired. Our results show that an unsolicited hint interface that combinespersuasion, making hint usage seem required, with high immediacy, making iteasy to see and do, can help overcome barriers to hint usage.6.2 H2 : Assertions will lead to students with low prior knowledge to formshorter proofs faster in the posttestSeveral studies have found ATI effects surrounding hint usage where stu-dents with low prior knowledge or proficiency benefit more from interven-tions [5, 41, 56]. In particular, Kardan and Conati [41] found that attentionto hints affected student performance in a tutor for teaching constraint sat-isfaction problems, and students with low prior knowledge experienced morepronounced effects from an adaptive hint design intervention. While their in-tervention dealt with both an unsolicited hint interface (highlights to directattention) and scaffolding (incremental textual hints), our study focuses onlyon the interface of unsolicited hints. Our ANCOVA results showed a significantaptitude-treatment interaction between Prior Proficiency { High, Low } andCondition { Assertions, Messages } . Using Tukey’s HSD tests, we inferred thatthe Assertions-Low group outperformed the Messages-Low group in posttestsolution length and time. We also observed a significant correlation of theposttest solution length and time with hint needed rate (HNR) for the LowPrior Proficiency group, suggesting that using more unsolicited hints as nec-essary components of their proofs helped this group learn better strategies. However, no such relations were observed for the High Prior Proficiency group.This suggests that adapting hint timing of Assertions based on proficiency mayimprove student performance as in other ITSs [82, 87, 91].6.3 H3 : Assertions foster productive persistence among students with lowprior knowledgePersistent effort is said to be productive when it is accompanied by an improve-ment in posttest performance [13]. Assertions are designed to encourage stu-dents to follow unsolicited hints that direct students toward optimal problem-solving strategies. Results from our empirical study support the notion thatAssertions promote productive persistence. Our cluster analysis showed thatthe majority of the Assertions-Low group exerted more effort (high persis-tence) during training, justified a higher proportion of unsolicited hints, andperformed better on the posttest than the class average. We also saw a higherproportion of the Messages-Low students in the cluster that exerted less effort(low persistence) in the training, justified a lower proportion of unsolicitedhints, and performed worse on the posttest than the class average. Interest-ingly, while most of the Assertions-Low group spent more time on unsolvedproblems in training, they took a significantly shorter time on the posttestwhile creating shorter posttest solutions, suggesting that the Assertions pro-moted productive persistence (i.e. time well spent) among students with lowprior proficiency.6.4 Assertions - a new genre of hintsOverall, this study showcases the importance of effective delivery for unso-licited hints, and a new genre of hints that we call Assertions. We believe thatproviding unsolicited hints as partially worked steps reduced the cognitiveload required for learning from them. Further, increasing spatial contiguityimproved students’ attention towards hints, and the isomorphic format mayhave made it easier for them to understand and use them in their solutions. Weobserved that Assertions led students with low prior knowledge to exert moreproductive persistence in training that resulted in better posttest performance,where they formed significantly shorter, more optimal, solutions in significantlyless time than their peers in the control condition who only received Messagehints. Assertions provide students with additional problem-solving resourcesthat can enable them to learn through the process of self-explaining (justi-fying) expert steps. We believe that Assertions may be particularly helpfulin multi-step domains, where providing students with partially-worked steps,right next to where they are needed, periodically, and in the same format asother problem-solving steps, could lead students to do more self-explanation(through justifying or completing the partially-worked steps) and by circum-venting help avoidance. voiding Help Avoidance 27 A limitation of this work is the difference in the timing of Assertions andMessages, which could have impacted the results. While the hint frequencycorrelation analysis showed that students’ posttest performance was not im-pacted by the number of unsolicited hints given, we recognize that the hinttiming may have had an impact on students. This limitation arises from thefact that we are modifying a real adaptive system to achieve practical improve-ments. These two types of hints were designed for different purposes. Messageswere intended to help someone who was struggling but forgot about the helpfeature. Assertions were intended to be proactive for students who wouldn’task for help no matter what.Assertions were designed to address the problemthat we observed, that Messages were not helping enough people improve theirperformance or learning.
In this study, we investigated the impact of Assertions, a new genre of unso-licited hints, on the hint usage and posttest performance within a data-driventutoring system. This work is novel in that it leveraged interface alone to ad-dress the help avoidance problem. However, this work did not seek to regulatestudents’ help-seeking, rather we sought to make unsolicited hints more effec-tive through changes in their delivery. The Assertions hint interface made theintelligent tutor more effective, significantly improving unsolicited hint usagefor all students. We further demonstrated aptitude-treatment interaction ef-fects where students with low prior proficiency receiving Assertions performedbetter in the posttest, in terms of both time and solution length. Our clusteranalysis shows that the students with low prior knowledge who received As-sertions demonstrate more productive persistence in that they exerted morepersistent effort even when failing during training, and used a higher propor-tion of unsolicited hints, but performed better on the posttest than their lowpeers who received Messages.There are three main limitations to this study. Assertions were providedsignificantly more frequently than Messages. Assertions did not seem to havea negative impact on learning, but rather leveled the playing field for studentswith low prior proficiency. However, our analyses demonstrated that it wasnot hint frequency but the Assertions interface alone that improved hint us-age. The second limitation was that Assertions appeared randomly, and werenot adapted to individual students. Our results confirm our hypothesis thatthe Assertions have a differential impact for students with different incomingproficiency, suggesting that there may be benefits to using individual factorsto determine when to provide Assertions. A third limitation arises from split-ting students into two prior proficiency groups. While some studies investigatefiner-grained partitions, e.g. low, medium/average, and high groups [41], werefrained from doing so to maintain sufficiently high sample sizes within eachgroup.
This study was a necessary first step to identify a hint interface that couldsolve the help avoidance problem. Future work could study the generalizabil-ity of this transformative new genre of unsolicited hints that use the designprinciples of contiguity, attention, and expectation to increase hint immediacyand persuasion to reduce help avoidance in other tutors. Within our tutor, weplan to apply reinforcement learning and other machine learning techniquesto derive an adaptive policy to decide when and if Assertions should be pro-vided to individual students. Since Assertions promote productive persistenceamong students with low prior knowledge, we also plan to develop a modelthat provides Assertions when the tutor detects or predicts unproductive be-haviors [44, 45].
This material is based upon work supported by the National Science Foun-dation under Grant No. 1726550, “Integrated Data-driven Technologies forIndividualized Instruction in STEM Learning Environments.”, led by Min Chiand Tiffany Barnes. We would like to thank Nicholas Lytle ([email protected])for suggesting edits in the introduction section to enhance its clarity.
References
1. Aleven, V., Koedinger, K.R.: Limitations of student control: Do students know whenthey need help? In: International Conference on Intelligent Tutoring Systems, pp. 292–303. Springer (2000)2. Aleven, V., Mclaren, B., Roll, I., Koedinger, K.: Toward meta-cognitive tutoring: Amodel of help seeking with a cognitive tutor. International Journal of Artificial Intelli-gence in Education (2), 101–128 (2006)3. Aleven, V., Ogan, A., Popescu, O., Torrey, C., Koedinger, K.: Evaluating the effective-ness of a tutorial dialogue system for self-explanation. In: International conference onintelligent tutoring systems, pp. 443–454. Springer (2004)4. Almeda, M.V.Q., Baker, R.S., Corbett, A.: Help avoidance: When students should seekhelp, and the consequences of failing to do so. In: Meeting of the Cognitive ScienceSociety, vol. 2428, p. 2433 (2017)5. Arroyo, I., Beck, J.E., Beal, C.R., Wing, R., Woolf, B.P.: Analyzing students’ responseto help provision in an elementary mathematics intelligent tutoring system. In: Papersof the AIED-2001 workshop on help provision and help seeking in interactive learningenvironments, pp. 34–46. Citeseer (2001)6. Ausin, M.S., Azizsoltani, H., Barnes, T., Chi, M.: Leveraging deep reinforcement learn-ing for pedagogical policy induction in an intelligent tutoring system. In: Proceedingsof The 12th International Conference on Educational Data Mining (EDM 2019), vol.168, p. 177. ERIC7. Bakke, S.: Immediacy in user interfaces: An activity theoretical approach. In: Interna-tional Conference on Human-Computer Interaction, pp. 14–22. Springer (2014)8. Barnes, T., Stamper, J.: Automatic hint generation for logic proof tutoring using his-torical data. Journal of Educational Technology & Society (1), 3 (2010)9. Barnes, T., Stamper, J., Croy, M.: Using markov decision processes for automatic hintgeneration. Handbook of Educational Data Mining (2011)10. Barnes, T., Stamper, J.C., Lehmann, L., Croy, M.J.: A pilot study on logic proof tutoringusing hints generated from historical student data. In: EDM, pp. 197–201 (2008)voiding Help Avoidance 2911. Bartholom´e, T., Stahl, E., Pieschl, S., Bromme, R.: What matters in help-seeking? astudy of help effectiveness and learner-related factors. Computers in Human Behavior (1), 113–129 (2006)12. Beck, J.E., Gong, Y.: Wheel-spinning: Students who fail to master a skill. In: Interna-tional conference on artificial intelligence in education, pp. 431–440. Springer (2013)13. Borghans, L., Duckworth, A.L., Heckman, J.J., Ter Weel, B.: The economics and psy-chology of personality traits. Journal of human Resources (4), 972–1059 (2008)14. Botelho, A., Varatharaj, A., Patikorn, T., Doherty, D., Adjei, S., Beck, J.: Developingearly detectors of student attrition and wheel spinning using deep learning. IEEETransactions on Learning Technologies (2019)15. Bunt, A., Conati, C., Muldner, K.: Scaffolding self-explanation to improve learning inexploratory learning environments. In: International Conference on Intelligent TutoringSystems, pp. 656–667. Springer (2004)16. Butcher, K.R., Aleven, V.: Integrating visual and verbal knowledge during classroomlearning with computer tutors. In: Proceedings of the Annual Meeting of the CognitiveScience Society, vol. 29 (2007)17. Butcher, K.R., Aleven, V.: Using student interactions to foster rule–diagram mappingduring problem solving in an intelligent tutoring system. Journal of Educational Psy-chology (4), 988 (2013)18. Chi, M.T., Bassok, M.: Learning from examples via self-explanations. Tech. rep., PITTS-BURGH UNIV PA LEARNING RESEARCH AND DEVELOPMENT CENTER (1988)19. Chi, M.T., De Leeuw, N., Chiu, M.H., LaVancher, C.: Eliciting self-explanations im-proves understanding. Cognitive science (3), 439–477 (1994)20. Cialdini, R.B.: Influence: Science and practice, vol. 4. Pearson education Boston, MA(2009)21. Cody, C., Mostafavi, B.: Investigating the impact of unsolicited next-step and subgoalhints on dropout in a logic proof tutor. In: Proceedings of the 2017 ACM SIGCSETechnical Symposium on Computer Science Education, pp. 705–705. ACM (2017)22. Conati, C., Jaques, N., Muir, M.: Understanding attention to adaptive hints in educa-tional games: an eye-tracking study. International Journal of Artificial Intelligence inEducation (1-4), 136–161 (2013)23. Conati, C., Manske, M.: Evaluating adaptive feedback in an educational computer game.In: International workshop on intelligent virtual agents, pp. 146–158. Springer (2009)24. Conati, C., Vanlehn, K.: Toward computer-based support of meta-cognitive skills: Acomputational framework to coach self-explanation (2000)25. Cronbach, L.J., Snow, R.E.: Aptitudes and instructional methods: A handbook forresearch on interactions. Irvington (1977)26. Davies, W., Cormican, K.: An analysis of the use of multimedia technology in computeraided design training: Towards effective design goals. Procedia Technology , 200–208(2013)27. Deke, J., Haimson, J.: Valuing student competencies: Which ones predict postsecondaryeducational attainment and earnings, and for whom? final report. Mathematica PolicyResearch, Inc. (2006)28. DiCerbo, K.E.: Game-based assessment of persistence. Journal of Educational Technol-ogy & Society (1), 17–28 (2014)29. Dillard, J.P., Seo, K.: Affect and persuasion. James PRICE dILLARd y Lijian sHEn(coord.), The Sage handbook of persuasion pp. 150–166 (2013)30. Dumdumaya, C., Rodrigo, M.M.: Predicting task persistence within a learning-by-teaching environment. In: Proceedings of the 26th International Conference on Com-puters in Education, pp. 1–10 (2018)31. Duong, H., Zhu, L., Wang, Y., Heffernan, N.T.: A prediction model that uses the se-quence of attempts and hints to better predict knowledge:” better to attempt the prob-lem first, rather than ask for a hint”. In: EDM, pp. 316–317 (2013)32. Fossati, D., Di Eugenio, B., Ohlsson, S., Brown, C., Chen, L.: Generating proactivefeedback to help students stay on track. In: International Conference on IntelligentTutoring Systems, pp. 315–317. Springer (2010)33. Fossati, D., Di Eugenio, B., Ohlsson, S., Brown, C., Chen, L.: Data driven automaticfeedback generation in the ilist intelligent tutoring system. Technology, Instruction,Cognition and Learning (1), 5–26 (2015)0 Mehak Maniktala et al.34. Healey, C., Enns, J.: Attention and visual memory in visualization and computer graph-ics. IEEE transactions on visualization and computer graphics (7), 1170–1188 (2012)35. Heckman, J.J., Stixrud, J., Urzua, S.: The effects of cognitive and noncognitive abilitieson labor market outcomes and social behavior. Journal of Labor economics (3),411–482 (2006)36. Hegarty, M., Just, M.A.: Constructing mental models of machines from text and dia-grams. Journal of memory and language (6), 717–742 (1993)37. Jin, W., Barnes, T., Stamper, J., Eagle, M.J., Johnson, M.W., Lehmann, L.: Programrepresentation for automatic hint generation for a data-driven novice programming tu-tor. In: International Conference on Intelligent Tutoring Systems, pp. 304–309. Springer(2012)38. Jin, W., Lehmann, L., Johnson, M., Eagle, M., Mostafavi, B., Barnes, T., Stamper,J.: Towards automatic hint generation for a data-driven novice programming tutor.In: Workshop on Knowledge Discovery in Educational Data, 17th ACM Conference onKnowledge Discovery and Data Mining. Citeseer (2011)39. Kai, S., Almeda, M.V., Baker, R.S., Heffernan, C., Heffernan, N.: Decision tree modelingof wheel-spinning and productive persistence in skill builders. JEDM— Journal ofEducational Data Mining (1), 36–71 (2018)40. Kanfer, R., Ackerman, P.L.: Motivation and cognitive abilities: An integrative/aptitude-treatment interaction approach to skill acquisition. Journal of applied psychology (4),657 (1989)41. Kardan, S., Conati, C.: Providing adaptive support in an interactive simulation forlearning: An experimental evaluation. In: Proceedings of the 33rd Annual ACM Con-ference on Human Factors in Computing Systems, pp. 3671–3680. ACM (2015)42. Koedinger, K.R., Aleven, V., Heffernan, N., McLaren, B., Hockenberry, M.: Opening thedoor to non-programmers: Authoring intelligent tutor behavior by demonstration. In:International Conference on Intelligent Tutoring Systems, pp. 162–174. Springer (2004)43. Liu, Z., Mostafavi, B., Barnes, T.: Combining worked examples and problem solving ina data-driven logic tutor. In: International Conference on Intelligent Tutoring Systems,pp. 347–353. Springer (2016)44. Maniktala, M., Barnes, T., Chi, M.: Extending the hint factory: Towards modellingproductivity for open-ended problem-solving. In: Proceedings of the 13th InternationalConference on Educational Data Mining (2020)45. Maniktala, M., Cody, C., Isvik, A., Lytle, N., Chi, M., Barnes, T.: Extending the hintfactory for the assistance dilemma: A novel, data-driven helpneed predictor for proactiveproblem-solving help. In: JEDM— Journal of Educational Data Mining (2020)46. Marwan, S., Lytle, N., Williams, J.J., Price, T.: The impact of adding textual explana-tions to next-step hints in a novice programming environment. In: Proceedings of the2019 ACM Conference on Innovation and Technology in Computer Science Education,pp. 520–526. ACM (2019)47. Mathews, M., Mitrovic, A.: How does students’ help-seeking behaviour affect learning?In: International Conference on Intelligent Tutoring Systems, pp. 363–372. Springer(2008)48. McLaren, B.M., Koedinger, K.R., Schneider, M., Harrer, A., Bollen, L.: Bootstrappingnovice data: Semi-automated tutor authoring using student log files (2004)49. McLaren, B.M., Lim, S.J., Koedinger, K.R.: When is assistance helpful to learning?results in combining worked examples and intelligent tutoring. In: International Con-ference on Intelligent Tutoring Systems, pp. 677–680. Springer (2008)50. Moreno, R., Mayer, R.E.: Cognitive principles of multimedia learning: The role of modal-ity and contiguity. Journal of educational psychology (2), 358 (1999)51. Mostafavi, B., Barnes, T.: Data-driven proficiency profiling: proof of concept. In: Pro-ceedings of the Sixth International Conference on Learning Analytics & Knowledge, pp.324–328 (2016)52. Mostafavi, B., Barnes, T.: Evolution of an intelligent deductive logic tutor using data-driven elements. International Journal of Artificial Intelligence in Education (1), 5–36(2017)53. Mostafavi, B., Eagle, M., Barnes, T.: Towards data-driven mastery learning. In: Pro-ceedings of the Fifth International Conference on Learning Analytics And Knowledge,pp. 270–274 (2015)voiding Help Avoidance 3154. Mostafavi, B., Zhou, G., Lynch, C., Chi, M., Barnes, T.: Data-driven worked examplesimprove retention and completion in a logic tutor. In: International Conference onArtificial Intelligence in Education, pp. 726–729. Springer (2015)55. Muir, M., Conati, C.: An analysis of attention to student–adaptive hints in an educa-tional game. In: International Conference on Intelligent Tutoring Systems, pp. 112–122.Springer (2012)56. Murray, R.C., VanLehn, K.: A comparison of decision-theoretic, fixed-policy and randomtutorial action selection. In: International Conference on Intelligent Tutoring Systems,pp. 114–123. Springer (2006)57. Murray, T.: An overview of intelligent tutoring system authoring tools: Updated analysisof the state of the art. In: Authoring tools for advanced technology learning environ-ments, pp. 491–544. Springer (2003)58. Nelson-Le Gall, S.: Help-seeking: An understudied problem-solving skill in children.Developmental Review (3), 224–246 (1981)59. Newell, A., Simon, H.A., et al.: Human problem solving, vol. 104. Prentice-Hall Engle-wood Cliffs, NJ (1972)60. Paaßen, B., Hammer, B., Price, T.W., Barnes, T., Gross, S., Pinkwart, N.: The contin-uous hint factory-providing hints in vast and sparsely populated edit distance spaces.arXiv preprint arXiv:1708.06564 (2017)61. Paunonen, S.V., Ashton, M.C.: Big five predictors of academic achievement. Journal ofResearch in Personality (1), 78–90 (2001)62. Price, T., Zhi, R., Barnes, T.: Evaluation of a data-driven feedback algorithm for open-ended programming. International Educational Data Mining Society (2017)63. Price, T.W., Dong, Y., Barnes, T.: Generating data-driven hints for open-ended pro-gramming. EDM , 191–198 (2016)64. Price, T.W., Dong, Y., Lipovac, D.: isnap: towards intelligent tutoring in novice pro-gramming environments. In: Proceedings of the 2017 ACM SIGCSE Technical Sympo-sium on computer science education, pp. 483–488 (2017)65. Price, T.W., Liu, Z., Catet´e, V., Barnes, T.: Factors influencing students’ help-seekingbehavior while programming with human and computer tutors. In: Proceedings of the2017 ACM Conference on International Computing Education Research, pp. 127–135(2017)66. Price, T.W., Zhi, R., Barnes, T.: Hint generation under uncertainty: The effect of hintquality on help-seeking behavior. In: International Conference on Artificial Intelligencein Education, pp. 311–322. Springer (2017)67. Price, T.W., Zhi, R., Dong, Y., Lytle, N., Barnes, T.: The impact of data quantity andsource on the quality of data-driven hints for programming. In: International Conferenceon Artificial Intelligence in Education, pp. 476–490. Springer (2018)68. Puustinen, M.: Help-seeking behavior in a problem-solving situation: Development ofself-regulation. European Journal of Psychology of education (2), 271 (1998)69. Razzaq, L., Heffernan, N.T.: Hints: is it better to give or wait to be asked? In: Interna-tional Conference on Intelligent Tutoring Systems, pp. 349–358. Springer (2010)70. Rivers, K., Koedinger, K.R.: Data-driven hint generation in vast solution spaces: a self-improving python programming tutor. International Journal of Artificial Intelligence inEducation (1), 37–64 (2017)71. Roll, I., Aleven, V., McLaren, B.M., Ryu, E., d Baker, R.S., Koedinger, K.R.: The helptutor: Does metacognitive feedback improve students’ help-seeking actions, skills andlearning? In: International Conference on Intelligent Tutoring Systems, pp. 360–369.Springer (2006)72. Shen, S., Chi, M.: Reinforcement learning: the sooner the better, or the later the better?In: Proceedings of the 2016 Conference on User Modeling Adaptation and Personaliza-tion, pp. 37–44. ACM (2016)73. Shen, S., Mostafavi, B., Lynch, C., Barnes, T., Chi, M.: Empirically evaluating theeffectiveness of pomdp vs. mdp towards the pedagogical strategies induction. In: In-ternational Conference on Artificial Intelligence in Education, pp. 327–331. Springer(2018)74. Snow, R.E.: Aptitude-treatment interaction as a framework for research on individualdifferences in psychotherapy. Journal of consulting and clinical psychology (2), 205(1991)2 Mehak Maniktala et al.75. Stamper, J., Barnes, T., Lehmann, L., Croy, M.: The hint factory: Automatic generationof contextualized help for existing computer aided instruction. In: Proceedings of the9th International Conference on Intelligent Tutoring Systems Young Researchers Track,pp. 71–78 (2008)76. Summerfield, C., Egner, T.: Expectation (and attention) in visual cognition. Trends incognitive sciences (9), 403–409 (2009)77. Sweller, J.: Cognitive load during problem solving: Effects on learning. Cognitive science (2), 257–285 (1988)78. Sweller, J.: Instructional design in technical areas. camberwell. Victoria: ACER Press(1999)79. Sweller, J.: The worked example effect and human cognition. Learning and instruction(2006)80. Sweller, J.: Human cognitive architecture. Handbook of research on educational com-munications and technology pp. 369–381 (2008)81. Tch´etagni, J.M., Nkambou, R.: Hierarchical representation and evaluation of the studentin an intelligent tutoring system. In: International Conference on Intelligent TutoringSystems, pp. 708–717. Springer (2002)82. Timms, M.J.: Using item response theory (irt) to select hints in an its. Frontiers inArtificial Intelligence and Applications , 213 (2007)83. Ueno, M., Miyazawa, Y.: Irt-based adaptive hints to scaffold learning in programming.IEEE Transactions on Learning Technologies (4), 415–428 (2017)84. Vanlehn, K.: The behavior of tutoring systems. International journal of artificial intel-ligence in education (3), 227–265 (2006)85. Ventura, M., Shute, V.: The validity of a game-based assessment of persistence. Com-puters in Human Behavior (6), 2568–2572 (2013)86. Ventura, M., Shute, V., Zhao, W.: The relationship between video game use and aperformance-based measure of persistence. Computers & Education (1), 52–58 (2013)87. Villesseche, J., Le Bohec, O., Quaireau, C., Nogues, J., Besnard, A.L., Oriez, S.,De La Haye, F., Noel, Y., Lavandier, K.: Enhancing reading skills through adaptivee-learning. Interactive Technology and Smart Education (2018)88. Weerasinghe, A., Mitrovic, A.: Enhancing learning through self-explanation. In: Inter-national Conference on Computers in Education, 2002. Proceedings., pp. 244–248. IEEE(2002)89. Weerasinghe, A., Mitrovic, A.: Supporting self-explanation in an open-ended domain.In: International Conference on Knowledge-Based and Intelligent Information and En-gineering Systems, pp. 306–313. Springer (2004)90. Wobbrock, J.O., Findlater, L., Gergle, D., Higgins, J.J.: The aligned rank transform fornonparametric factorial analyses using only anova procedures. In: Proceedings of theSIGCHI conference on human factors in computing systems, pp. 143–146. ACM (2011)91. Wood, H., Wood, D.: Help seeking, learning and contingent tutoring. Computers &Education (2-3), 153–169 (1999)92. Zhou, G., Price, T.W., Lynch, C., Barnes, T., Chi, M.: The impact of granularity onworked examples and problem solving. In: CogSci (2015)voiding Help Avoidance 33 A : Unsolicited Hint Metrics for each prior proficiency group
Prior