[PDF] Toward more accurate measurement of the impact of online instructional design on students' ability to transfer physics problem-solving skills

Abstract

In two earlier studies, we developed a new method to measure students' ability to transfer physics problem solving skills to new contexts using a sequence of online learning modules, and implemented two interventions in the form of additional learning modules designed to improve transfer ability. The current paper introduces a new data analysis scheme that could improve the accuracy of the measurement by accounting for possible differences in students' goal orientation and behavior, as well as revealing the possible mechanism by which one of the two interventions improves transfer ability. Based on a two by two framework of self-regulated learning, students with a performance-avoidance oriented goal are more likely to guess on some of the assessment attempts in order to save time, resulting in an underestimation of the student populations' transfer ability. The current analysis shows that about half of the students had frequent brief initial assessment attempts, and significantly lower correct rates on certain modules, which we think is likely to have originated at least in part from students adopting a performance-avoidance strategy. We then divided the remaining population, for which we can be certain that few students adopted a performance-avoidance strategy, based on whether they interacted with one of the intervention modules designed to develop basic problem solving skills, or passed that module on their first attempt without interacting with the instructional material. By comparing to propensity score matched populations from a previous semester, we found that the improvement in subsequent transfer performance observed in a previous study mainly came from the latter population, suggesting that the intervention served as an effective reminder for students to activate existing skills, but fell short of developing those skills among those who have yet to master it.

Full PDF

TToward more accurate measurements and a better understanding of students’ transferability in solving physics problems online

Kyle M. Whitcomb, Matthew W. Guthrie, Chandralekha Singh, and Zhongzhou Chen Department of Physics and Astronomy, University of Pittsburgh, Pittsburgh, PA, 15260 Department of Physics, University of Central Florida, Orlando, FL, 32816

Two earlier studies demonstrated that students’ behavior data from a sequence of online learningmodules can be analyzed to measure their ability to transfer their knowledge on solving one physicsproblem to a similar new one. In addition, adding an on-ramp module that develops basic skillsimproved students’ transfer ability. In the current study, we improved the accuracy of the trans-fer measurement by identifying and excluding students who interacted with the learning modulesdiﬀerently from what was expected and examined two possible mechanisms by which the on-rampmodule could improve student transfer. Based on a two by two framework of self-regulated learning,we hypothesized that students with a performance-avoidance oriented goal are more likely to con-sistently guess on their initial attempts, leaving a distinctive pattern in the log data and resultingin an underestimation of students’ actual transfer ability. We divided the remaining student sampleaccording to whether they passed the on-ramp module before or after accessing the instructionalmaterials and compared their performance to a propensity score-matched sample from a previoussemester. Improvement in transfer ability was found to primarily occur among students who passedthe on-ramp module before learning. A possible explanation is that the on-ramp module served asan eﬀective reminder for students who already possess the essential skills, but may be insuﬃcient todevelop those skills for other students. Our results suggest that online learning modules can be anaccurate and ﬂexible tool in assessing students’ transfer ability. Further, our results demonstratethat the analysis of online learning data can produce more accurate and insightful results whentaking into account details of student learning behavior and learning strategy. a r X i v : . [ phy s i c s . e d - ph ] J u l I. INTRODUCTION

In addition to learning physics concepts, a key objective of physics instruction is to facilitate students’ developmentof robust problem solving skills and the ability to transfer those skills to novel contexts [1–3]. How instructionalmethods can be developed and evaluated to enhance students’ transfer ability is a highly valuable research questionfor STEM education. However, most existing instruments that assess students’ conceptual understanding [4, 5] orproblem solving skills at scale [6, 7] are not designed to directly measure their ability to transfer, since the testsdo not explicitly provide students the opportunity or the resources to learn during the test. Therefore, developingnew methods that are not only able to accurately measure students’ ability to transfer, but also shed light on theeﬀectiveness of learning materials and instructional practices is a valuable initial step in the eﬀort to improve students’transfer ability.In an earlier paper [8] we proposed a new method for measuring students’ ability to transfer their learning fromonline problem solving tutorials to new problem contexts with diﬀerent surface features by analyzing the log ofclickstream data of students interacting with a sequence of online learning modules (OLMs). Each module containsboth learning materials and assessment problems, as explained in more detail in sections I A and II A. We found thatwhile introductory-level college physics students are highly capable of learning to solve speciﬁc problems from onlinetutorials, they struggled to transfer their learning to a slightly modiﬁed problem given immediately afterward on thenext module. In a follow-up study [9], we tested two diﬀerent methods to enhance students’ ability to transfer inan OLM sequence and found evidence suggesting that the addition of an “on-ramp” module (a scaﬀolding moduledesigned to solidify essential basic skills and concepts [10, 11]) prior to the tutorial resulted in signiﬁcant improvementin students’ ability to transfer their knowledge in the rotational kinematics sequence.Those early results raised two important questions that the current study tries to answer. First, since the OLMsare assigned for students to complete on their own, what fraction of students interacted with the modules as wehad intended? For those who did not, to what extent did their behavior, as described in section I B, aﬀect thevalidity of our measurement of students’ transfer ability, and how can we mitigate those impacts for a more accuratemeasurement? Second, while earlier analyses suggested that the “on-ramp” modules may be eﬀective, what is themechanism by which those modules enhance students’ ability to transfer? Are the beneﬁts of those modules exclusiveonly to students who interacted with them in a certain way, as explained in section I C?

A. Measuring transfer in an OLM sequence

As will be explained in more detail in section II, each OLM consists of an instructional component (IC) and anassessment component (AC) which contains one or two problems, as demonstrated in Fig. 2 adapted from [8]. Studentsare required to complete at least one attempt on the AC before being allowed to study the IC, a design that wasinspired by the frameworks of preparation for future learning [1] and productive failure [12]. Students who failed theirﬁrst attempt can learn to solve the speciﬁc type of problem from the IC. When students complete a sequence of twoor more OLMs in sequence on the same topic involving similar assessment problems, their required ﬁrst attempt onthe subsequent module serves as an assessment of their ability to transfer their learning from the IC of the previousmodule. When more than two modules are involved, students’ performance on later modules could be attributed toindirect transfer due to a preparation for future learning eﬀect; that is, completing the ﬁrst module better preparesstudents to learn from the second module, which in turn increases performance on the third and subsequent modules.Data from OLMs can be visualized in a “zigzag” plot (Fig. 1, adapted from Ref. [9]), developed in earlier studiesand explained in detail in section II D. Every two points represent the total assessment passing percentage of thestudent population on attempts before and after learning from the IC of each module. Students’ ability to learn tosolve a speciﬁc problem is reﬂected by an increase in passing percentage from Pre to Post on the same module. Theodd-numbered points in Fig. 1 (i.e, those labeled “Pre” as well as “Quiz”) show passing rates on initial attemptsprior to learning from the IC of each module, and an increase from one point to the next reﬂects students’ ability totransfer their learning from the previous module(s).

B. Students’ diﬀerent learning strategies and possible impact on assessment

Measuring students’ transfer ability from their performance on OLM assessment attempts requires that the majorityof students either seriously took the required ﬁrst attempt of each module, or made a quick guess only when they feelthat they cannot solve the problem. However, research on students’ self-regulated learning (SRL) processes suggeststhat learners may choose to guess regardless of their ability or conﬁdence to solve an assessment problem accordingto their motivational goal orientation. Using a 2 × llllll lll Assessment P e r c en t age o f S t uden t s w ho P a ss FIG. 1. An example of a zigzag plot, adapted from Ref. [9]. Each point represents the passing rate of students either before(“Pre”) or after (“Post”) being given access to the instructional material in each module. Passing rates in the Post-stage of amodule are cumulative with Pre-stage attempts. See section II D for more details. classiﬁed along both the deﬁnition dimension and the valence dimension. On the deﬁnition dimension, the learnercan be either mastery-oriented or performance-oriented. Simply put, mastery-oriented learners focus more on and aremostly motivated by the intrinsic value of mastering the subject, while performance-oriented learners are motivatedby extrinsic values (see also the summary of Pintrich’s model [15, 16] by Winne [17]), such as obtaining the homeworkcredit for each module. On the valence dimension, learners either focus on a “positive possibility to approach (i.e.,success)” or on a “negative possibility to avoid (i.e., failure).”It is easy to imagine that if a learner has a performance-avoidance type achievement goal, then they are likely toadopt a strategy akin to a “coping mode,” described by Boekaerts [18] as primarily focusing on “preserving [study]resources and avoiding damage.” In the context of interacting with OLM modules, a student with a performance-avoidance goal is likely to randomly submit an answer on their required ﬁrst attempt to avoid “unnecessary failure”and save time, and then study the IC to ensure success on their next attempt. For those students, their initial attemptsreﬂect their learning strategy, rather than their level of content mastery, transfer ability, or even their conﬁdence. Ifsome students in our sample did adopt such a strategy, then the log data of their interactions with the modules willhave two characteristic features: ﬁrst, their initial attempts will frequently be signiﬁcantly shorter in time and havemuch lower passing rates when compared to other students, especially on easier modules; second, their passing rateon attempts after study will be similar to everyone else.If a non-negligible fraction of students adopt the performance-avoidance strategy, their data could signiﬁcantlydistort the estimation of transfer ability for the entire student population. Properly identifying and removing thosestudents from the sample will improve the accuracy of the measurement using data from OLMs.

C. Distinguishing between two diﬀerent mechanisms of the on-ramp module

In our earlier study [9], we found that the addition of an “on-ramp” module at the beginning of the OLM sequenceresulted in better performance on the required ﬁrst attempts for subsequent modules compared to students from theprevious semester. The “on-ramp” modules contain practice problems designed to develop and enhance students’proﬁciency of essential skills necessary for problem solving. However, students who passed the AC of the on-rampmodule on their required ﬁrst attempt (or on attempts before accessing the IC) can choose to directly move on tothe next module without interacting with the IC of the on-ramp module. Therefore, if the on-ramp module enhancesstudents’ transfer ability by improving their proﬁciency on essential skills, then the improvement will be less signiﬁcantor nonexistent among those who passed on the ﬁrst attempt, and only observed among those who failed their initialattempt and accessed the IC. Alternatively, if the on-ramp module mainly serves as a “reminder” for students toactivate existing knowledge of essential skills, then the beneﬁt should be more signiﬁcant among those who passed onthe ﬁrst attempt, and less so for those who studied the IC. Distinguishing between those two mechanisms can betterguide the future development of instructional materials to enhance students’ ability to transfer.

AC1On-Ramp AC2Tutorial AC3Example 1 AC4Example 2 AC5QuizIC1:Solution IC2:Tutorial IC3:Solution IC4:Solution IC5:Empty

Added in 2018 Added in 2018 AC = A ssessment C omponent IC = I nstructional C omponent FIG. 2. The sequence of Online Learning Modules (OLMs) designed for this experiment. Each OLM contains an assessmentcomponent (AC) and an instructional component (IC). Students are required to make at least one attempt on the AC ﬁrst,then are allowed to view the IC, and go on to make subsequent attempts on the AC. OLMs 1 and 4 were added for the 2018implementation.

D. Research questions

To summarize, in this study we will answer the following three research questions:

RQ1

What fraction of students adopted a performance-avoidance strategy when interacting with OLM sequences?

RQ2

To what extent did the results from previous studies change after students with performance-avoidance strategieswere removed from the sample?

RQ3

Did the on-ramp module enhance students’ ability to transfer by improving students’ proﬁciency in essentialskills or by serving as a “reminder” for those who are already proﬁcient?The ﬁrst two research questions are important for the accuracy of the measurements, and lay the groundwork foranswering

RQ3 . In sections II A to II C, we will explain in detail the structure and implementation of OLM sequence,as well as the data collection process. In section II D, we present our operational deﬁnition of key concepts such asassessment passing percentage and performance-avoidance strategy in the context of OLMs and outline our analysisprocedure for measuring transfer and answering the research questions. In section III, we present the results of ouranalysis, which are interpreted in section IV A, and their implications are discussed in the rest of IV.

II. METHODSA. OLM Sequence Structure

The study was conducted using online learning modules (OLMs) [8, 9, 19, 20] implemented on the open sourceObojobo platform [21] developed by the Center for Distributed Learning at the University of Central Florida (UCF).Each OLM contains an assessment component (AC) and an instructional component (IC) (see Fig. 2. Students have5 attempts on the AC, which contains 1-2 multiple-choice problems, and must make at least one attempt before beingallowed to access the IC. The IC contains instructional text, ﬁgures, and/or practice questions in general. Speciﬁccontents of the IC used in each of the modules in the current study will be detailed in the next section. In an OLMsequence, a student must either pass or use up all ﬁve attempts on the AC before being allowed to access the nextmodule. Students’ interaction with each OLM can be divided into three stages: The pre-study (Pre) stage in whicha student makes one or more attempts on the AC, the study stage in which those who failed in the Pre stage studythe IC, and the post-study (Post) stage in which students make additional attempts on the AC. A small fraction ofstudents have also been observed to choose to skip the study stage after multiple failed attempts in the Pre stage.A student is counted as passing an AC if the student correctly answers all problems in the AC within their ﬁrst 3attempts, including both Pre and Post stage attempts. In other words, students who either failed on all 5 attemptsor passed on their 4th or 5th attempts are considered as failing the module in the current study.

B. Study Setup

In Fall 2017, two sequences each containing 3 OLMs (speciﬁcally, OLMs 2, 3, and 5 in Fig. 2) were assigned ashomework to 235 students enrolled in a calculus-based introductory physics class at UCF [8]. The 6 modules wereworth 3% of the total course credit. The ﬁrst OLM sequence teaches students to solve Atwood machine type problemswith blocks hanging from massive pulleys using knowledge of rotational kinematics (RK). The second sequenceteaches students to solve angular collision problems such as a girl jumping onto a merry-go-round using knowledge ofconservation of angular momentum (AM). Both sequences are designed to develop and measure students’ ability totransfer problem solving skills to slightly diﬀerent contexts. The modules used in this study are free and available tothe public at Ref. [22].The AC of each OLM contains one problem that can be solved using the same physics principles as other ACs inthe OLM sequence. The IC of OLM 2 (Fig. 2) contains an online tutorial developed by DeVore and Singh [23, 24], inthe form of a sequence of practice questions. The IC of OLM 3 contains a worked solution to the AC problem, andthe IC of OLM 5 is empty since it is intended to serve the role of a quiz.In Fall 2018, the two OLM sequences were each modiﬁed by adding two additional OLMs (shown in Fig. 2) andimplemented again in the same course taught by the same instructor as homework to 241 students. Both sequenceswere assigned as homework that was worth 3% of the total course credit. The ﬁrst new module in each sequence is the“on-ramp” module (OLM 1 in Fig. 2), which contains an AC focusing on one or more basic procedural skills necessaryfor solving the subsequent ACs in the OLM sequence. For the RK sequence, the on-ramp module presents studentswith two Atwood machine problems of the simplest form, involving one or two blocks hanging at the same radiusfrom a single massive pulley. For the AM sequence, the on-ramp module addressed the common student diﬃculty ofcalculating both the magnitude and sign of the angular momentum of an object traveling in a straight line about aﬁxed point in space. The second new module in each sequence is the “Example 2” module (OLM 4 in Fig. 2), whichcontains in its AC a new problem that shares the same deep structure as the one in the previous module, but diﬀersin surface features. The IC of the module was designed in two formats: a compare-contrast format in which studentswere given questions that prompted them to compare the similarity and diﬃculty of the solutions to the problems inAC3 and AC4, and a guided tutorial format consisting of a series of tutorial-style scaﬀolding questions guiding themthrough the solution of the problem in AC4. Each form was provided to half of the student population at random.We found no diﬀerence between the two cohorts in terms of students’ behavior and performance on the subsequentmodule 5 [9].

C. Data Collection and Selection

Anonymized clickstream data were collected from the Obojobo platform for all students who interacted with theOLM sequences. The following types of information were extracted from the log data following the same procedureexplained in detail in Ref. [25]: the number of attempts on the AC of each module, the outcome of each attempt(pass/fail), the start time and duration of each attempt, and the start times of interaction with the IC. The durationof interaction with the IC was also extracted but was not used in the current analysis.In addition, students’ exam scores and overall course grades, each on a 0-100 scale, were also collected, anonymized,and linked to each students’ log data. The exam scores consist of two midterm exams, each counting for 12% of theﬁnal course grade, and a ﬁnal exam counting for 16% of the ﬁnal course grade. The ﬁnal course grade also containsscores from homework, lab, and classroom participation.In order to maintain a consistent sample across our analyses, only data from students who attempted every modulein a sequence at least once are included. Data from seven students for the 2017 RK sequence were removed becauseof this reason, and two or fewer students were removed for all other OLM sequences. Data from 202 students wereretained for the RK sequence in 2017, 198 students in the RK sequence for 2018, 198 students for the AM sequencein 2017, and 189 students for the AM sequence in 2018.In the Fall 2017 implementation, half of the students were given the option to skip the initial AC attempt of OLM2 (the ﬁrst OLM in that implementation) and proceed directly to the tutorial in the IC. However, we found in anearlier study [8] that very few students chose to exercise this option and among those who did there was no detectableimpact on subsequent problem solving behavior and outcome. Therefore, in the current analysis, we combined thosetwo groups into one. Similarly, for the Fall 2018 semester, we combined data from students encountering the twodiﬀerent versions of IC in module 4, since no diﬀerence in their behavior and outcome on module 5 could be detected [9].

D. Data Analysis

To identify and estimate the size of students adopting a performance-avoidance strategy (

RQ1 ), we will analyzethe frequency of students making a very brief ﬁrst attempt on each module. As explained in section I B, students whoadopt this strategy are more likely to consistently guess on their ﬁrst attempts and gain access to the instructionalmaterial.In the current analysis, we categorize each student’s ﬁrst attempt as a “Brief Attempt” (BA) if the duration of theattempt is less than 35 seconds. This cutoﬀ time is inherited from a careful analysis of similar OLMs in an earlier

TABLE I. The number of students in each OLM sequence by their number of Brief Attempts. The Brief Attempt groupsconsist of those who had 0-1, 2-3, or 4 Brief Attempts throughout the ﬁrst four modules.OLM study [25], and chosen as a conservative estimate for the minimum amount of time needed to read and attempt a givenquestion. Students are categorized into three “BA groups” based on the number of BAs on the ﬁrst four modules:0-1 BAs, 2-3 BAs, and 4 BAs. Table I shows the number of students in each BA group for each OLM sequence. BAson the quiz module were not considered since there was no IC for the students to access. Due to the conservativeBA duration estimation, we believe that the 0-1 BA group is the one with the least performance-avoidance focusedstudents, and are most likely to make valid ﬁrst attempts on the AC.To examine the extent to which the behavior of performance-avoidance focused students aﬀect the measurement oftransfer (

RQ2 ), we will compare the Pre and Post stage passing rates of the three BA groups on all modules in thetwo sequences, and plot the outcomes in Fig. 3. Following the convention established in two previous studies [8, 9],the pass rates are deﬁned as follows. On each OLM module except for module 5, the pass rates ( P ) of students wascalculated for both the Pre-study ( P pre) and Post-study attempts ( P post). The Pre-study pass rate on each moduleis calculated as P pre = N pre N total , (1)with N pre being the number of students who passed Pre-study and N total being the total number of students whoattempted the module. Similarly, the Post-study pass rate on each module is calculated as P post = N pre + N post N total , (2)with N post being the number of students who passed Post-study. By including both N pre and N post, the Post passingrate reﬂects the total number of students able to pass the assessment after being given the access to the IC, assumingthat students who passed in the Pre stage can also pass in the Post stage if re-tested. This deﬁnition is similar tothe Post test score in a Pre-test/Post-test setting. For module 5, the passing rate does not distinguish between Preand Post stage because the IC of the module contains no instructional resources. The P pre on modules 2-4 and P on module 5 measures students’ ability to transfer their learning from modules 1-4. We hypothesized that the 0-1BA group would have signiﬁcantly better performance than the other two BA groups on their Pre stage attempts onmodules 2, 3, and 4 because the other two BA groups are more likely to forfeit the ﬁrst attempt opportunity regardlessof their ability to solve the problem. We further hypothesized that the Post-study pass rates for each BA group will bevery similar, because P post reﬂects students ability to learn from the modules and solve the speciﬁc problem (if theyare not already proﬁcient), and the dominant factor separating the three groups is students’ engagement strategy, nottheir ability to learn from the modules.Finally, to examine the mechanism by which the on-ramp module improves transfer of knowledge ( RQ3 ), we ﬁrstseparate the student sample from Fall 2018 into three “on-ramp cohorts”: • Pass On-Ramp Pre : students who passed the on-ramp AC before accessing the IC, • Pass On-Ramp Post : students who passed the on-ramp AC only after accessing the IC, and • Fail : students who did not pass the on-ramp AC within 3 attempts.Based on the analysis outcome for

RQ1 and

RQ2 , we only retained data from the 0-1 BA group for this analysis,since our analysis indicates that data from the other two BA groups could result in an underestimation of students’ability to transfer.Next, we identiﬁed three comparable cohorts of students from the 2017 sample. We ﬁrst retained students who onlymade 0-1 BA on modules in the 2017 sequence, then identiﬁed comparable cohorts using propensity score matching,since the general ability of the 0-1 BA group could be diﬀerent from the rest of the student population. Propensityscores were constructed using a combination of standardized scores from two mid-term exams and one ﬁnal exam in lll lll lll lll lll lll lll

Assessment P e r c en t age o f S t uden t s w ho P a ss lll (a) Rotational Kinematics lll lll lll lll lll lll lll Assessment P e r c en t age o f S t uden t s w ho P a ss lll (b) Angular Momentum FIG. 3. Students are grouped by their number of Brief Attempts throughout the OLM sequences for (a) Rotational Kinematicsand (b) Angular Momentum. The pass rates of these groups in each module are plotted along with their standard error. ll ll ll ll l l ll

Assessment P e r c en t age o f S t uden t s w ho P a ss ll RK 2018RK 2017 − Matched (a) Rotational Kinematics ll ll ll ll l l ll

Assessment P e r c en t age o f S t uden t s w ho P a ss ll AM 2018AM 2017 − Matched (b) Angular Momentum

FIG. 4. Using propensity score matching on course exam scores, a subset of 2017 students are matched to 2018 students with0-1 Brief Attempts. The pass rates of these two samples are then plotted separately for (a) Rotational Kinematics (RK) and(b) Angular Momentum (AM). both semesters. Each exam is largely identical across the two semesters, with one or two questions being replaced ormodiﬁed.Pass rates on all modules in both sequences are compared between the three 2018 cohorts and the three propensityscore matched 2017 cohorts in order to distinguish between the two possible mechanisms for of the on-ramp module.If the “improve proﬁciency” eﬀect was dominant, then the performance diﬀerences should be observed mostly amongthe Pass On-Ramp Post cohort and its matched cohort in 2017. If the “reminder” eﬀect was dominant, then thediﬀerences will be observed for the Pass On-Ramp Pre cohort and its counterparts.Propensity score matching was performed using R [26] and the

MatchIt package [27]. The MatchIt algorithm retainsall treated data and attempts to ﬁnd either an exact one-to-one match or balance the overall covariant distributionfor the control data.Data analysis, statistical testing, and visual analysis were conducted using R [26] and the tidyverse package [28].

III. RESULTS

First, we estimate the fraction of students that adopted a performance-avoidance strategy (

RQ1 ) by listing thenumber of students with 0-1, 2-3, or 4 BAs on the ﬁrst four modules of each sequence in Table I. The result showsthat, even with relatively conservative criteria for classifying brief attempts, we still identiﬁed 10-15% of the studentswho made a brief attempt on each of the four modules. On the other hand, around 50% of the students belong tothe 0-1 BA group. Within the 0-1 BA group, the number of students in each on-ramp cohort is listed in Table II foreach OLM Sequence .Figure 3 shows the Pre and Post stage pass rates of students on modules 2-5, separated by the number of BAs onthe ﬁrst four modules. Pass rates from the two sequences are plotted separately: the RK sequence in Fig. 3a and theAM sequence in Fig. 3b. In both Fig. 3a and Fig. 3b, the most prominent diﬀerence between the three BA groupsis that students in the 0-1 BA group signiﬁcantly outperformed the other two groups in Pre stage attempts for theExample 1 module (OLM 2, Fig 2) (Fisher’s exact test on 2 × p < .

001 for the RK sequence and p = 0 .

001 for the AM sequence). Students in the 0-1 BA group also outperformed the 2-3 BA group on RK TutorialPost Stage attempts ( p = 0 . p = 0 . RQ2 ) that students adopting a performance-avoidancestrategy could have a measurable impact on the estimation of the transfer ability of the student population usingperformance data from Obojobo. Therefore, to mitigate this impact, we limit ourselves to studying the 0-1 BA groupfor both 2017 and 2018 student samples in the following analysis.We compared the pass rates of the 0-1 BA group from 2018 on modules 2-5 with a propensity score matchedsubsample in 2017 who also had 0-1 BAs on the ﬁrst two modules. The pass-rates for both sequences are shownin Fig. 4, while the p -values from Fisher’s exact test comparing each pair of data points on the ﬁgures is listed inthe ﬁrst two rows of Table III. All p -values are adjusted for Type I error due to conducting multiple tests using theBenjamini-Hochberg method [29]. The data shows that there are signiﬁcant performance diﬀerences in the successrate between the two student populations on Tutorial Pre and Example 1 Pre attempts in the RK sequence, whereasthe diﬀerence in the AM sequence is less prominent, possibly due to the success rate being very high in both samples.The diﬀerences are similar in nature but larger in magnitude compared to what was observed in our earlier studythat did not consider alternative learning strategies [9], suggesting that the earlier study could have underestimatedthe transfer ability of the student population.To examine the mechanism by which the on-ramp module improves the transfer of knowledge ( RQ3 ), we dividedthe 2018 0-1 BA population into three cohorts. Since the Fail cohort is much smaller than the other two cohorts andtoo small for reliable propensity score matching, we will only analyze the Pass On-Ramp Pre and Pass On-Ramp Postcohorts (see Table II). In Fig. 5, we compare the performance of those two cohorts to their propensity score matchedcounterparts in the Fall 2017 semester. The pass rates of the two cohorts on the same module sequence are shownside by side. Data from the RK sequence is shown on the top row (Fig. 5a and Fig. 5b) and the AM sequence in thebottom row (Fig. 5c and Fig. 5d). The adjusted p -values of Fisher’s exact test between each pair of points are listedin the last four rows of Table III.It can be seen from Fig. 5 that the Pass On-Ramp Pre cohort is responsible for the majority of the diﬀerences onPre-study attempts between the 2017 and 2018 samples. For the RK sequence, the diﬀerences are not statisticallysigniﬁcant for the Pass On-Ramp Post cohort after ( p -value adjustment [29]). For the AM sequence, none of thediﬀerences were statistically signiﬁcant after p -value adjustment for either the Pass On-Ramp Pre or Pass On-RampPost cohorts. TABLE II. The number of students in each OLM sequence that fall into each on-ramp cohort among those with 0-1 BriefAttempts. The cohorts consist of those who passed during on-ramp Pre-study attempts (“Pass On-Ramp Pre”), those whopassed during on-ramp Post-study attempts (“Pass On-Ramp Post”), and those that failed the on-ramp assessment (“Fail”).Since the on-ramp module was only included in Fall 2018, only students from 2018 are included here.OLM Pass On-Ramp Pass On-Ramp FailSequence Pre PostRK 32 57 11AM 32 47 12

IV. DISCUSSIONA. Interpretation of results

We found that roughly half of the students frequently or consistently adopted a learning strategy that is likelymotivated by a performance-avoidance goal: making abnormally short attempts on their required ﬁrst attempts onsome or all of the ﬁrst four modules. These brief attempts could have been generated by students who were eitherguessing or copying the answer from a peer. While an occasional brief attempt may indicate a lack of conﬁdence inone’s knowledge, continuous brief attempts on multiple modules are more likely a strategic choice to save time onthe task, since the performance diﬀerences on attempts after studying the learning material are much smaller. Thisstrategy ﬁts well with Boekaerts’s description of students being in a “coping mode,” in which their goal is to pass themodule while saving time and “unnecessary” possible failures [18].For students who adopted the performance-avoidance strategy, their transfer ability can no longer be measuredusing OLMs, as their brief Pre-study attempts on the following modules do not always reﬂect their true ability totransfer their learning from the current module. Our analysis suggests that including data from those students resultedin an underestimation of students’ ability to transfer knowledge from the Tutorial module (module 2) to the Example1 module (module 3) in our earlier study, although most of the qualitative conclusions remain the same. However, itmust be clariﬁed that the current analysis is also not an accurate measurement of the transfer ability for the entirestudent population. Instead, it is an accurate measurement for the subpopulation who did not frequently guess ontheir initial attempts.An alternative explanation of our observation is that students who frequently adopt the strategy have a lower levelof overall mastery on the subject (and possibly a higher level of self-awareness of their lack of knowledge). Therefore,they would not have been able to pass the required Pre stage attempt even if they had tried, and thus including thosestudents would not result in an underestimation of students’ transfer ability. However, while this may be true forsome students, we do not think that this explanation applies to the majority of students in the 2-3 BA and 4 BAgroups. This is because their performance on modules 2, 4, and 5, as well as on the Post stage of the Example 1module are either similar to that of the 0-1 BA group or only slightly worse, which suggests that their overall physicsabilities are similar and therefore the diﬀerence observed in the Pre stage attempts on the Example 1 module aremostly due to diﬀerence in strategical choice.Another major ﬁnding of the current analysis is that the beneﬁt of the on-ramp module in facilitating transfer(as measured by Pre stage attempts of subsequent modules) predominantly occurs among students who can pass theon-ramp module before accessing the instructional component. The diﬀerence is much more prominent for the morechallenging RK sequence, and less so for the easier AM sequence. This unexpected observation holds true even afterwe used propensity score matching between the two semesters to control for the fact that the Pass On-Ramp Precohort likely includes students with better physics knowledge or higher motivation than students in the Pass On-RampPost cohort.A possible explanation could come from the basic principles of information processing theory [30, 31]. For studentswho already possess the essential skills or procedures, attempting the on-ramp module assessment prompted them toretrieve those skills from long-term memory and retain them in working memory. All or part of those skills remainedeither in the working memory or in a more active state when the students moved on to the subsequent modules, therebyfreeing up cognitive capacity for those students to better comprehend the additional complexity of the Tutorial andExample 1 modules. On the other hand, for those who had not yet mastered those essential skills, the IC of theon-ramp module was suﬃcient for them to pass the assessment, but not enough for them to achieve a higher level of

TABLE III. A list of p -values from Fisher’s exact test comparing the performance of 2018 students and matched 2017 studentson each common assessment in the listed ﬁgure. The p -values have been adjusted using the Benjamini-Hochberg method [29].Tutorial Tutorial Example 1 Example 1 QuizFig. Pre Post Pre Post4a 0.003 0.330 < . < .

001 0.0544b 0.333 0.265 0.166 0.306 0.1665a 0.001 1.000 0.001 0.395 0.4385b 0.498 1.000 0.008 0.028 0.0285c 0.764 0.766 0.267 1.000 0.7665d 0.835 0.835 0.835 0.835 0.835 ll ll ll ll l l ll

Assessment P e r c en t age o f S t uden t s w ho P a ss ll RK 2018 − 0−1 BA& Pass On−Ramp PreRK 2017 − Matched (a) Rotational Kinematics: Matching 2018 Pass On-Ramp Prestudents. ll ll ll ll l l ll

Assessment P e r c en t age o f S t uden t s w ho P a ss ll RK 2018 − 0−1 BA& Pass On−Ramp PostRK 2017 − Matched (b) Rotational Kinematics: Matching 2018 Pass On-Ramp Poststudents. ll ll ll ll l l ll

Assessment P e r c en t age o f S t uden t s w ho P a ss ll AM 2018 − 0−1 BA& Pass On−Ramp PreAM 2017 − Matched (c) Angular Momentum: Matching 2018 Pass On-Ramp Prestudents. ll ll ll ll l l ll

Assessment P e r c en t age o f S t uden t s w ho P a ss ll AM 2018 − 0−1 BA& Pass On−Ramp PostAM 2017 − Matched (d) Angular Momentum: Matching 2018 Pass On-Ramp Poststudents.

FIG. 5. Using propensity score matching on course exam scores, a subset of 2017 students are matched to 2018 students with0-1 Brief Attempts in either the (a) and (c) Pass On-Ramp Pre or (b) and (d) Pass On-Ramp Post cohorts. The pass ratesof these two cohorts are plotted separately for (a) and (b) Rotational Kinematics (RK) and (c) and (d) Angular Momentum(AM). proﬁciency. Therefore, activating those skills on the subsequent modules required a higher amount of cognitive load,limiting students’ abilities to process the additional complexities.A straightforward and testable implication of this explanation is that providing students with more practice problemson those essential skills will increase their ability to learn and transfer on subsequent modules. In addition, it maybe beneﬁcial to distribute those practices rather than clustering them immediately prior to the tutorial sequence, asdistributed practice has been shown to be beneﬁcial for skill acquisition and recall [32, 33], and practices of distributedretrieval of factual knowledge have been shown improved students’ physics exam scores [34].It must be pointed out that our use of propensity score matching to control for the fact that our selected studentpopulations likely have diﬀerent knowledge and motivation than the rest of the population is far from perfect, sinceoverall exam scores may not fully reﬂect knowledge on the speciﬁc topic involved. A more accurate propensity scorecould be constructed in the future, when additional modules on the same topic are created and assigned to studentsprior to the tutorial sequence. Such modules have been created and administered in the Fall 2019 semester, enabling0more accurate analysis to be conducted in the future.

B. Implications for Online Education Research

Our analysis shows that students’ behaviors in a self-regulated online learning environment frequently deviate fromwhat was intended or expected by the instructor. Those unexpected behaviors, such as frequently guessing (or cheatingin some cases) on problems, can have a substantial impact on the outcomes of data analysis if not properly accountedfor. Excluding students with unexpected behavior improves the accuracy of the measurement, but also limits themeasurement to only those who interacted with online learning resources as expected. However, this should not be seenas a limitation that is unique to online education research, since students completing not for credit paper-and-pencilassessments can also adopt avoidance goal oriented strategies. In fact, the ability to detect the presence of diversestudent behavior, and correct for their impact in data analysis is a unique strength of online educational research. Itcan also motivate and facilitate future development of instructional strategies to reduce procedure-avoidance strategiesamong students in an online environment.Furthermore, in our earlier analysis [9] on the same module sequences, we found that instructional resources designedbased on well-documented learning science principles may not always generate expected outcomes due to variations inthe actual implementation. The current analysis further reveals that even when the instructional resource did resultin the expected outcome improvement, the underlying mechanism may be diﬀerent from what was expected. In thiscase, modules that were designed to train the proﬁciency of essential skills among students actually beneﬁted thosewho were already proﬁcient and did not go through the training by serving as a reminder for them to activate theskills. Those results demonstrate the high level of complexity and unpredictability involved in designing and creatingeﬀective instructional resources. Moreover, they highlight the importance of discipline-based education researchers’role as “Education Engineers” who try to bridge the gap between learning theories and actual instructional practices.Last but not least, the current study is an exploratory attempt at evaluating the eﬀectiveness of instructionalmaterials by comparing the outcomes of students enrolled in two consecutive semesters and controlling for the extrinsicvariances using propensity score matching. Compared to the more common method of conducting randomized ABexperiments [35, 36], the current method is signiﬁcantly easier to implement in actual classroom settings and introducesfewer disruptions for students compared to randomized control experiments. In addition, this method allows for alarger sample size since each group consists of an entire class rather than a fraction of the class. While it introducesmore variances due to the treatment and control groups coming from diﬀerent semesters, we demonstrated that theimpact from those variances could be controlled to some extent by methods such as propensity score matching. Thisless disruptive study setup can be particularly valuable under certain situations, such as during the current COVID-19outbreak which presents students with many obstacles as institutions shift to fully remote instruction, and instructorsare reluctant to introduce more potential sources of confusion.

ACKNOWLEDGMENTS

The authors would like to thank the Learning Systems and Technology team at UCF for developing the Obojoboplatform. Dr. Michelle Taub provided critical and insightful comments on students’ self-regulated learning. Thisresearch is partly supported by NSF Grants DUE-1845436 and DUE-1524575 and the Alfred P. Sloan FoundationGrant G-2018-11183. [1] J. D. Bransford and D. L. Schwartz, “Rethinking transfer: A simple proposal with multiple implications,” Review ofResearch in Education , 61–100 (1999).[2] H. S. Broudy, “Types of knowledge and purposes of education,” in Schooling and the Acquisition of Knowledge , edited byRichard C. Anderson, Rand J. Spiro, and William E. Montague (Routledge, 1977) pp. 1–17.[3] Douglas K. Detterman, “The case for the prosecution: Transfer as an epiphenomenon,” in

Transfer on Trial: Intelligence,Cognition, and Instruction , edited by D. K. Detterman and R. J. Sternberg (Ablex Publishing, 1993) pp. 1–24.[4] David Hestenes, Malcolm Wells, and Gregg Swackhamer, “Force concept inventory,” The Physics Teacher , 141–158(1992).[5] Ronald K. Thornton and David R. Sokoloﬀ, “Assessing student learning of newtons laws: The force and motion conceptualevaluation and the evaluation of active learning laboratory and lecture curricula,” American Journal of Physics , 338–352(1998). [6] Andrew Pawl, Analia Barrantes, Carolin Cardamone, Saif Rayyan, and David E. Pritchard, “Development of a mechanicsreasoning inventory,” AIP Conference Proceedings , 287–290 (2012).[7] Jeﬀrey Marx and Karen Cummings, “Development of a survey instrument to gauge students’ problemsolving abilities,”AIP Conference Proceedings , 221–224 (2010).[8] Zhongzhou Chen, Kyle M. Whitcomb, and Chandralekha Singh, “Measuring the eﬀectiveness of online problem-solvingtutorials by multi-level knowledge transfer,” in Physics Education Research Conference 2018 , PER Conference, edited byA. Traxler, Y. Cao, and S. Wolf (Physics Education Research Topical Group and the American Association of PhysicsTeachers, Washington, DC, 2018).[9] Zhongzhou Chen, Kyle M. Whitcomb, Matthew W. Guthrie, and Chandralekha Singh, “Evaluating the eﬀectiveness oftwo methods to improve students’ problem solving performance after studying an online tutorial,” in

Physics EducationResearch Conference 2019 , PER Conference (Physics Education Research Topical Group and the American Associationof Physics Teachers, Provo, UT, 2019).[10] Brendon D. Mikula and Andrew F. Heckler, “Framework and implementation for improving physics essential skills viacomputer-based practice: Vector math,” Phys. Rev. Phys. Educ. Res. , 010122 (2017).[11] Nicholas T. Young and Andrew F. Heckler, “Observed hierarchy of student proﬁciency with period, frequency, and angularfrequency,” Phys. Rev. Phys. Educ. Res. , 010104 (2018).[12] Manu Kapur, “Productive failure in mathematical problem solving,” Instructional Science , 523–550 (2009).[13] Andrew J. Elliot and Kou Murayama, “On the measurement of achievement goals: Critique, illustration, and application.”Journal of Educational Psychology , 613–628 (2008).[14] Andrew J. Elliot and Holly A. McGregor, “A 2 × , 501–519 (2001).[15] Paul R. Pintrich, “The role of goal orientation in self-regulated learning,” in Handbook of Self-Regulation , edited by MoniqueBoekaerts, Paul R. Pintrich, and Moshe Zeidner (Academic Press, San Diego, 2000) pp. 451 – 502.[16] Paul R Pintrich, “A conceptual framework for assessing motivation and self-regulated learning in college students,” Edu-cational Psychology Review , 385–407 (2004).[17] Philip H Winne, “Self-regulated learning,” in International Encyclopedia of the Social & Behavioral Sciences , Vol. 21,edited by J D Wright (Elsevier, Oxford, UK, 2015) 2nd ed., pp. 535–540.[18] Monique Boekaerts and Markku Niemivirta, “Self-regulated learning: Finding a balance between learning goals and ego-protective goals,” in

Handbook of Self-Regulation , edited by Monique Boekaerts, Paul R. Pintrich, and Moshe Zeidner(Academic Press, San Diego, 2000) pp. 417 – 450.[19] Zhongzhou Chen, Geoﬀrey Garrido, Zachary Berry, Ian Turgeon, and Francisca Yonekura, “Designing online learningmodules to conduct pre- and post-testing at high frequency,” in

Physics Education Research Conference 2017 , PERConference (Physics Education Research Topical Group and the American Association of Physics Teachers, Cincinnati,OH, 2017) pp. 84–87.[20] Zhongzhou Chen, Sunbok Lee, and Geoﬀrey Garrido, “Re-designing the structure of online courses to empower educationaldata mining,” in

Proceedings of the 11th International Conference on Educational Data Mining, EDM 2018, Buﬀalo, NY,USA, July 15-18, 2018 (2018).[21] Zachary Berry, Ian Turgeon, and Francisca Yonekura,

Obojobo Next (2020).[22] Kyle M. Whitcomb, Matthew W. Guthrie, and Zhongzhou Chen,

Online tutorial sequences for 2017-2018 experiments (2020).[23] Seth DeVore, Emily Marshman, and Chandralekha Singh, “Challenge of engaging all students via self-paced interactiveelectronic learning tutorials for introductory physics,” Physical Review Physics Education Research , 010127 (2017).[24] Chandralekha Singh and Daniel Haileselassie, “Developing problem-solving skills of students taking introductory physicsvia web-based tutorials,” Journal of College Science Teaching , 42–49 (2010).[25] Zhongzhou Chen, Mengyu Xu, Geoﬀrey Garrido, and Matthew W. Guthrie, “Relationship between students’ onlinelearning behavior and course performance: What contextual information matters?” Phys. Rev. Phys. Educ. Res. ,010138 (2020).[26] R Core Team, R: A Language and Environment for Statistical Computing , R Foundation for Statistical Computing, Vienna,Austria (2019).[27] Daniel Ho, Kosuke Imai, Gary King, and Elizabeth Stuart, “Matchit: Nonparametric preprocessing for parametric causalinference,” Journal of Statistical Software, Articles , 1–28 (2011).[28] Hadley Wickham, tidyverse: Easily Install and Load the ‘Tidyverse’ (2017), R package version 1.2.1.[29] Yoav Benjamini and Yosef Hochberg, “Controlling the false discovery rate: a practical and powerful approach to multipletesting,” Journal of the Royal Statistical Society: Series B (Methodological) , 289–300 (1995).[30] John Sweller, Paul Ayres, and Slava Kalyuga, Cognitive Load Theory (Springer New York, 2011).[31] Herbert A. Simon, “Information-processing theory of human problem solving,” in

Handbook of learning & cognitive pro-cesses: V. Human information (Lawrence Erlbaum, 1978) Chap. 1, pp. 271–295.[32] John Dunlosky, Katherine A. Rawson, Elizabeth J. Marsh, Mitchell J. Nathan, and Daniel T. Willingham, “Improvingstudents learning with eﬀective learning techniques: Promising directions from cognitive and educational psychology,”Psychological Science in the Public Interest , 4–58 (2013).[33] Charles Henderson, Jos´e P. Mestre, and Linda L. Slakey, “Cognitive science research can improve undergraduate steminstruction: What are the barriers?” Policy Insights from the Behavioral and Brain Sciences , 51–60 (2015).[34] Vegard Gjerde, Bodil Holst, and Stein Dankert Kolstø, “Retrieval practice of a hierarchical principle structure in universityintroductory physics: Making stronger students,” Physical Review Physics Education Research (2020), 10.1103/phys- revphyseducres.16.013103.[35] Zhongzhou Chen and Gary Gladding, “How to make a good animation: A grounded cognition model of how visualrepresentation design aﬀects the construction of abstract physics knowledge,” Phys. Rev. ST Phys. Educ. Res. , 010111(2014).[36] Zhongzhou Chen, Christopher Chudzicki, Daniel Palumbo, Giora Alexandron, Youn-Jeng Choi, Qian Zhou, and David E.Pritchard, “Researching for better instructional methods using AB experiments in MOOCs: results and challenges,”Research and Practice in Technology Enhanced Learning11