Toward more accurate measurement of the impact of online instructional design on students' ability to transfer physics problem-solving skills
Kyle M. Whitcomb, Matthew W. Guthrie, Chandralekha Singh, Zhongzhou Chen
TToward more accurate measurements and a better understanding of students’ transferability in solving physics problems online
Kyle M. Whitcomb, Matthew W. Guthrie, Chandralekha Singh, and Zhongzhou Chen Department of Physics and Astronomy, University of Pittsburgh, Pittsburgh, PA, 15260 Department of Physics, University of Central Florida, Orlando, FL, 32816
Two earlier studies demonstrated that students’ behavior data from a sequence of online learningmodules can be analyzed to measure their ability to transfer their knowledge on solving one physicsproblem to a similar new one. In addition, adding an on-ramp module that develops basic skillsimproved students’ transfer ability. In the current study, we improved the accuracy of the trans-fer measurement by identifying and excluding students who interacted with the learning modulesdifferently from what was expected and examined two possible mechanisms by which the on-rampmodule could improve student transfer. Based on a two by two framework of self-regulated learning,we hypothesized that students with a performance-avoidance oriented goal are more likely to con-sistently guess on their initial attempts, leaving a distinctive pattern in the log data and resultingin an underestimation of students’ actual transfer ability. We divided the remaining student sampleaccording to whether they passed the on-ramp module before or after accessing the instructionalmaterials and compared their performance to a propensity score-matched sample from a previoussemester. Improvement in transfer ability was found to primarily occur among students who passedthe on-ramp module before learning. A possible explanation is that the on-ramp module served asan effective reminder for students who already possess the essential skills, but may be insufficient todevelop those skills for other students. Our results suggest that online learning modules can be anaccurate and flexible tool in assessing students’ transfer ability. Further, our results demonstratethat the analysis of online learning data can produce more accurate and insightful results whentaking into account details of student learning behavior and learning strategy. a r X i v : . [ phy s i c s . e d - ph ] J u l I. INTRODUCTION
In addition to learning physics concepts, a key objective of physics instruction is to facilitate students’ developmentof robust problem solving skills and the ability to transfer those skills to novel contexts [1–3]. How instructionalmethods can be developed and evaluated to enhance students’ transfer ability is a highly valuable research questionfor STEM education. However, most existing instruments that assess students’ conceptual understanding [4, 5] orproblem solving skills at scale [6, 7] are not designed to directly measure their ability to transfer, since the testsdo not explicitly provide students the opportunity or the resources to learn during the test. Therefore, developingnew methods that are not only able to accurately measure students’ ability to transfer, but also shed light on theeffectiveness of learning materials and instructional practices is a valuable initial step in the effort to improve students’transfer ability.In an earlier paper [8] we proposed a new method for measuring students’ ability to transfer their learning fromonline problem solving tutorials to new problem contexts with different surface features by analyzing the log ofclickstream data of students interacting with a sequence of online learning modules (OLMs). Each module containsboth learning materials and assessment problems, as explained in more detail in sections I A and II A. We found thatwhile introductory-level college physics students are highly capable of learning to solve specific problems from onlinetutorials, they struggled to transfer their learning to a slightly modified problem given immediately afterward on thenext module. In a follow-up study [9], we tested two different methods to enhance students’ ability to transfer inan OLM sequence and found evidence suggesting that the addition of an “on-ramp” module (a scaffolding moduledesigned to solidify essential basic skills and concepts [10, 11]) prior to the tutorial resulted in significant improvementin students’ ability to transfer their knowledge in the rotational kinematics sequence.Those early results raised two important questions that the current study tries to answer. First, since the OLMsare assigned for students to complete on their own, what fraction of students interacted with the modules as wehad intended? For those who did not, to what extent did their behavior, as described in section I B, affect thevalidity of our measurement of students’ transfer ability, and how can we mitigate those impacts for a more accuratemeasurement? Second, while earlier analyses suggested that the “on-ramp” modules may be effective, what is themechanism by which those modules enhance students’ ability to transfer? Are the benefits of those modules exclusiveonly to students who interacted with them in a certain way, as explained in section I C?
A. Measuring transfer in an OLM sequence
As will be explained in more detail in section II, each OLM consists of an instructional component (IC) and anassessment component (AC) which contains one or two problems, as demonstrated in Fig. 2 adapted from [8]. Studentsare required to complete at least one attempt on the AC before being allowed to study the IC, a design that wasinspired by the frameworks of preparation for future learning [1] and productive failure [12]. Students who failed theirfirst attempt can learn to solve the specific type of problem from the IC. When students complete a sequence of twoor more OLMs in sequence on the same topic involving similar assessment problems, their required first attempt onthe subsequent module serves as an assessment of their ability to transfer their learning from the IC of the previousmodule. When more than two modules are involved, students’ performance on later modules could be attributed toindirect transfer due to a preparation for future learning effect; that is, completing the first module better preparesstudents to learn from the second module, which in turn increases performance on the third and subsequent modules.Data from OLMs can be visualized in a “zigzag” plot (Fig. 1, adapted from Ref. [9]), developed in earlier studiesand explained in detail in section II D. Every two points represent the total assessment passing percentage of thestudent population on attempts before and after learning from the IC of each module. Students’ ability to learn tosolve a specific problem is reflected by an increase in passing percentage from Pre to Post on the same module. Theodd-numbered points in Fig. 1 (i.e, those labeled “Pre” as well as “Quiz”) show passing rates on initial attemptsprior to learning from the IC of each module, and an increase from one point to the next reflects students’ ability totransfer their learning from the previous module(s).
B. Students’ different learning strategies and possible impact on assessment
Measuring students’ transfer ability from their performance on OLM assessment attempts requires that the majorityof students either seriously took the required first attempt of each module, or made a quick guess only when they feelthat they cannot solve the problem. However, research on students’ self-regulated learning (SRL) processes suggeststhat learners may choose to guess regardless of their ability or confidence to solve an assessment problem accordingto their motivational goal orientation. Using a 2 × llllll lll Assessment P e r c en t age o f S t uden t s w ho P a ss FIG. 1. An example of a zigzag plot, adapted from Ref. [9]. Each point represents the passing rate of students either before(“Pre”) or after (“Post”) being given access to the instructional material in each module. Passing rates in the Post-stage of amodule are cumulative with Pre-stage attempts. See section II D for more details. classified along both the definition dimension and the valence dimension. On the definition dimension, the learnercan be either mastery-oriented or performance-oriented. Simply put, mastery-oriented learners focus more on and aremostly motivated by the intrinsic value of mastering the subject, while performance-oriented learners are motivatedby extrinsic values (see also the summary of Pintrich’s model [15, 16] by Winne [17]), such as obtaining the homeworkcredit for each module. On the valence dimension, learners either focus on a “positive possibility to approach (i.e.,success)” or on a “negative possibility to avoid (i.e., failure).”It is easy to imagine that if a learner has a performance-avoidance type achievement goal, then they are likely toadopt a strategy akin to a “coping mode,” described by Boekaerts [18] as primarily focusing on “preserving [study]resources and avoiding damage.” In the context of interacting with OLM modules, a student with a performance-avoidance goal is likely to randomly submit an answer on their required first attempt to avoid “unnecessary failure”and save time, and then study the IC to ensure success on their next attempt. For those students, their initial attemptsreflect their learning strategy, rather than their level of content mastery, transfer ability, or even their confidence. Ifsome students in our sample did adopt such a strategy, then the log data of their interactions with the modules willhave two characteristic features: first, their initial attempts will frequently be significantly shorter in time and havemuch lower passing rates when compared to other students, especially on easier modules; second, their passing rateon attempts after study will be similar to everyone else.If a non-negligible fraction of students adopt the performance-avoidance strategy, their data could significantlydistort the estimation of transfer ability for the entire student population. Properly identifying and removing thosestudents from the sample will improve the accuracy of the measurement using data from OLMs.
C. Distinguishing between two different mechanisms of the on-ramp module
In our earlier study [9], we found that the addition of an “on-ramp” module at the beginning of the OLM sequenceresulted in better performance on the required first attempts for subsequent modules compared to students from theprevious semester. The “on-ramp” modules contain practice problems designed to develop and enhance students’proficiency of essential skills necessary for problem solving. However, students who passed the AC of the on-rampmodule on their required first attempt (or on attempts before accessing the IC) can choose to directly move on tothe next module without interacting with the IC of the on-ramp module. Therefore, if the on-ramp module enhancesstudents’ transfer ability by improving their proficiency on essential skills, then the improvement will be less significantor nonexistent among those who passed on the first attempt, and only observed among those who failed their initialattempt and accessed the IC. Alternatively, if the on-ramp module mainly serves as a “reminder” for students toactivate existing knowledge of essential skills, then the benefit should be more significant among those who passed onthe first attempt, and less so for those who studied the IC. Distinguishing between those two mechanisms can betterguide the future development of instructional materials to enhance students’ ability to transfer.
AC1On-Ramp AC2Tutorial AC3Example 1 AC4Example 2 AC5QuizIC1:Solution IC2:Tutorial IC3:Solution IC4:Solution IC5:Empty
Added in 2018 Added in 2018 AC = A ssessment C omponent IC = I nstructional C omponent FIG. 2. The sequence of Online Learning Modules (OLMs) designed for this experiment. Each OLM contains an assessmentcomponent (AC) and an instructional component (IC). Students are required to make at least one attempt on the AC first,then are allowed to view the IC, and go on to make subsequent attempts on the AC. OLMs 1 and 4 were added for the 2018implementation.
D. Research questions
To summarize, in this study we will answer the following three research questions:
RQ1
What fraction of students adopted a performance-avoidance strategy when interacting with OLM sequences?
RQ2
To what extent did the results from previous studies change after students with performance-avoidance strategieswere removed from the sample?
RQ3
Did the on-ramp module enhance students’ ability to transfer by improving students’ proficiency in essentialskills or by serving as a “reminder” for those who are already proficient?The first two research questions are important for the accuracy of the measurements, and lay the groundwork foranswering
RQ3 . In sections II A to II C, we will explain in detail the structure and implementation of OLM sequence,as well as the data collection process. In section II D, we present our operational definition of key concepts such asassessment passing percentage and performance-avoidance strategy in the context of OLMs and outline our analysisprocedure for measuring transfer and answering the research questions. In section III, we present the results of ouranalysis, which are interpreted in section IV A, and their implications are discussed in the rest of IV.
II. METHODSA. OLM Sequence Structure
The study was conducted using online learning modules (OLMs) [8, 9, 19, 20] implemented on the open sourceObojobo platform [21] developed by the Center for Distributed Learning at the University of Central Florida (UCF).Each OLM contains an assessment component (AC) and an instructional component (IC) (see Fig. 2. Students have5 attempts on the AC, which contains 1-2 multiple-choice problems, and must make at least one attempt before beingallowed to access the IC. The IC contains instructional text, figures, and/or practice questions in general. Specificcontents of the IC used in each of the modules in the current study will be detailed in the next section. In an OLMsequence, a student must either pass or use up all five attempts on the AC before being allowed to access the nextmodule. Students’ interaction with each OLM can be divided into three stages: The pre-study (Pre) stage in whicha student makes one or more attempts on the AC, the study stage in which those who failed in the Pre stage studythe IC, and the post-study (Post) stage in which students make additional attempts on the AC. A small fraction ofstudents have also been observed to choose to skip the study stage after multiple failed attempts in the Pre stage.A student is counted as passing an AC if the student correctly answers all problems in the AC within their first 3attempts, including both Pre and Post stage attempts. In other words, students who either failed on all 5 attemptsor passed on their 4th or 5th attempts are considered as failing the module in the current study.
B. Study Setup
In Fall 2017, two sequences each containing 3 OLMs (specifically, OLMs 2, 3, and 5 in Fig. 2) were assigned ashomework to 235 students enrolled in a calculus-based introductory physics class at UCF [8]. The 6 modules wereworth 3% of the total course credit. The first OLM sequence teaches students to solve Atwood machine type problemswith blocks hanging from massive pulleys using knowledge of rotational kinematics (RK). The second sequenceteaches students to solve angular collision problems such as a girl jumping onto a merry-go-round using knowledge ofconservation of angular momentum (AM). Both sequences are designed to develop and measure students’ ability totransfer problem solving skills to slightly different contexts. The modules used in this study are free and available tothe public at Ref. [22].The AC of each OLM contains one problem that can be solved using the same physics principles as other ACs inthe OLM sequence. The IC of OLM 2 (Fig. 2) contains an online tutorial developed by DeVore and Singh [23, 24], inthe form of a sequence of practice questions. The IC of OLM 3 contains a worked solution to the AC problem, andthe IC of OLM 5 is empty since it is intended to serve the role of a quiz.In Fall 2018, the two OLM sequences were each modified by adding two additional OLMs (shown in Fig. 2) andimplemented again in the same course taught by the same instructor as homework to 241 students. Both sequenceswere assigned as homework that was worth 3% of the total course credit. The first new module in each sequence is the“on-ramp” module (OLM 1 in Fig. 2), which contains an AC focusing on one or more basic procedural skills necessaryfor solving the subsequent ACs in the OLM sequence. For the RK sequence, the on-ramp module presents studentswith two Atwood machine problems of the simplest form, involving one or two blocks hanging at the same radiusfrom a single massive pulley. For the AM sequence, the on-ramp module addressed the common student difficulty ofcalculating both the magnitude and sign of the angular momentum of an object traveling in a straight line about afixed point in space. The second new module in each sequence is the “Example 2” module (OLM 4 in Fig. 2), whichcontains in its AC a new problem that shares the same deep structure as the one in the previous module, but differsin surface features. The IC of the module was designed in two formats: a compare-contrast format in which studentswere given questions that prompted them to compare the similarity and difficulty of the solutions to the problems inAC3 and AC4, and a guided tutorial format consisting of a series of tutorial-style scaffolding questions guiding themthrough the solution of the problem in AC4. Each form was provided to half of the student population at random.We found no difference between the two cohorts in terms of students’ behavior and performance on the subsequentmodule 5 [9].
C. Data Collection and Selection
Anonymized clickstream data were collected from the Obojobo platform for all students who interacted with theOLM sequences. The following types of information were extracted from the log data following the same procedureexplained in detail in Ref. [25]: the number of attempts on the AC of each module, the outcome of each attempt(pass/fail), the start time and duration of each attempt, and the start times of interaction with the IC. The durationof interaction with the IC was also extracted but was not used in the current analysis.In addition, students’ exam scores and overall course grades, each on a 0-100 scale, were also collected, anonymized,and linked to each students’ log data. The exam scores consist of two midterm exams, each counting for 12% of thefinal course grade, and a final exam counting for 16% of the final course grade. The final course grade also containsscores from homework, lab, and classroom participation.In order to maintain a consistent sample across our analyses, only data from students who attempted every modulein a sequence at least once are included. Data from seven students for the 2017 RK sequence were removed becauseof this reason, and two or fewer students were removed for all other OLM sequences. Data from 202 students wereretained for the RK sequence in 2017, 198 students in the RK sequence for 2018, 198 students for the AM sequencein 2017, and 189 students for the AM sequence in 2018.In the Fall 2017 implementation, half of the students were given the option to skip the initial AC attempt of OLM2 (the first OLM in that implementation) and proceed directly to the tutorial in the IC. However, we found in anearlier study [8] that very few students chose to exercise this option and among those who did there was no detectableimpact on subsequent problem solving behavior and outcome. Therefore, in the current analysis, we combined thosetwo groups into one. Similarly, for the Fall 2018 semester, we combined data from students encountering the twodifferent versions of IC in module 4, since no difference in their behavior and outcome on module 5 could be detected [9].
D. Data Analysis
To identify and estimate the size of students adopting a performance-avoidance strategy (
RQ1 ), we will analyzethe frequency of students making a very brief first attempt on each module. As explained in section I B, students whoadopt this strategy are more likely to consistently guess on their first attempts and gain access to the instructionalmaterial.In the current analysis, we categorize each student’s first attempt as a “Brief Attempt” (BA) if the duration of theattempt is less than 35 seconds. This cutoff time is inherited from a careful analysis of similar OLMs in an earlier
TABLE I. The number of students in each OLM sequence by their number of Brief Attempts. The Brief Attempt groupsconsist of those who had 0-1, 2-3, or 4 Brief Attempts throughout the first four modules.OLM study [25], and chosen as a conservative estimate for the minimum amount of time needed to read and attempt a givenquestion. Students are categorized into three “BA groups” based on the number of BAs on the first four modules:0-1 BAs, 2-3 BAs, and 4 BAs. Table I shows the number of students in each BA group for each OLM sequence. BAson the quiz module were not considered since there was no IC for the students to access. Due to the conservativeBA duration estimation, we believe that the 0-1 BA group is the one with the least performance-avoidance focusedstudents, and are most likely to make valid first attempts on the AC.To examine the extent to which the behavior of performance-avoidance focused students affect the measurement oftransfer (
RQ2 ), we will compare the Pre and Post stage passing rates of the three BA groups on all modules in thetwo sequences, and plot the outcomes in Fig. 3. Following the convention established in two previous studies [8, 9],the pass rates are defined as follows. On each OLM module except for module 5, the pass rates ( P ) of students wascalculated for both the Pre-study ( P pre) and Post-study attempts ( P post). The Pre-study pass rate on each moduleis calculated as P pre = N pre N total , (1)with N pre being the number of students who passed Pre-study and N total being the total number of students whoattempted the module. Similarly, the Post-study pass rate on each module is calculated as P post = N pre + N post N total , (2)with N post being the number of students who passed Post-study. By including both N pre and N post, the Post passingrate reflects the total number of students able to pass the assessment after being given the access to the IC, assumingthat students who passed in the Pre stage can also pass in the Post stage if re-tested. This definition is similar tothe Post test score in a Pre-test/Post-test setting. For module 5, the passing rate does not distinguish between Preand Post stage because the IC of the module contains no instructional resources. The P pre on modules 2-4 and P on module 5 measures students’ ability to transfer their learning from modules 1-4. We hypothesized that the 0-1BA group would have significantly better performance than the other two BA groups on their Pre stage attempts onmodules 2, 3, and 4 because the other two BA groups are more likely to forfeit the first attempt opportunity regardlessof their ability to solve the problem. We further hypothesized that the Post-study pass rates for each BA group will bevery similar, because P post reflects students ability to learn from the modules and solve the specific problem (if theyare not already proficient), and the dominant factor separating the three groups is students’ engagement strategy, nottheir ability to learn from the modules.Finally, to examine the mechanism by which the on-ramp module improves transfer of knowledge ( RQ3 ), we firstseparate the student sample from Fall 2018 into three “on-ramp cohorts”: • Pass On-Ramp Pre : students who passed the on-ramp AC before accessing the IC, • Pass On-Ramp Post : students who passed the on-ramp AC only after accessing the IC, and • Fail : students who did not pass the on-ramp AC within 3 attempts.Based on the analysis outcome for
RQ1 and
RQ2 , we only retained data from the 0-1 BA group for this analysis,since our analysis indicates that data from the other two BA groups could result in an underestimation of students’ability to transfer.Next, we identified three comparable cohorts of students from the 2017 sample. We first retained students who onlymade 0-1 BA on modules in the 2017 sequence, then identified comparable cohorts using propensity score matching,since the general ability of the 0-1 BA group could be different from the rest of the student population. Propensityscores were constructed using a combination of standardized scores from two mid-term exams and one final exam in lll lll lll lll lll lll lll
Assessment P e r c en t age o f S t uden t s w ho P a ss lll (a) Rotational Kinematics lll lll lll lll lll lll lll Assessment P e r c en t age o f S t uden t s w ho P a ss lll (b) Angular Momentum FIG. 3. Students are grouped by their number of Brief Attempts throughout the OLM sequences for (a) Rotational Kinematicsand (b) Angular Momentum. The pass rates of these groups in each module are plotted along with their standard error. ll ll ll ll l l ll
Assessment P e r c en t age o f S t uden t s w ho P a ss ll RK 2018RK 2017 − Matched (a) Rotational Kinematics ll ll ll ll l l ll
Assessment P e r c en t age o f S t uden t s w ho P a ss ll AM 2018AM 2017 − Matched (b) Angular Momentum
FIG. 4. Using propensity score matching on course exam scores, a subset of 2017 students are matched to 2018 students with0-1 Brief Attempts. The pass rates of these two samples are then plotted separately for (a) Rotational Kinematics (RK) and(b) Angular Momentum (AM). both semesters. Each exam is largely identical across the two semesters, with one or two questions being replaced ormodified.Pass rates on all modules in both sequences are compared between the three 2018 cohorts and the three propensityscore matched 2017 cohorts in order to distinguish between the two possible mechanisms for of the on-ramp module.If the “improve proficiency” effect was dominant, then the performance differences should be observed mostly amongthe Pass On-Ramp Post cohort and its matched cohort in 2017. If the “reminder” effect was dominant, then thedifferences will be observed for the Pass On-Ramp Pre cohort and its counterparts.Propensity score matching was performed using R [26] and the
MatchIt package [27]. The MatchIt algorithm retainsall treated data and attempts to find either an exact one-to-one match or balance the overall covariant distributionfor the control data.Data analysis, statistical testing, and visual analysis were conducted using R [26] and the tidyverse package [28].
III. RESULTS
First, we estimate the fraction of students that adopted a performance-avoidance strategy (
RQ1 ) by listing thenumber of students with 0-1, 2-3, or 4 BAs on the first four modules of each sequence in Table I. The result showsthat, even with relatively conservative criteria for classifying brief attempts, we still identified 10-15% of the studentswho made a brief attempt on each of the four modules. On the other hand, around 50% of the students belong tothe 0-1 BA group. Within the 0-1 BA group, the number of students in each on-ramp cohort is listed in Table II foreach OLM Sequence .Figure 3 shows the Pre and Post stage pass rates of students on modules 2-5, separated by the number of BAs onthe first four modules. Pass rates from the two sequences are plotted separately: the RK sequence in Fig. 3a and theAM sequence in Fig. 3b. In both Fig. 3a and Fig. 3b, the most prominent difference between the three BA groupsis that students in the 0-1 BA group significantly outperformed the other two groups in Pre stage attempts for theExample 1 module (OLM 2, Fig 2) (Fisher’s exact test on 2 × p < .
001 for the RK sequence and p = 0 .
001 for the AM sequence). Students in the 0-1 BA group also outperformed the 2-3 BA group on RK TutorialPost Stage attempts ( p = 0 . p = 0 . RQ2 ) that students adopting a performance-avoidancestrategy could have a measurable impact on the estimation of the transfer ability of the student population usingperformance data from Obojobo. Therefore, to mitigate this impact, we limit ourselves to studying the 0-1 BA groupfor both 2017 and 2018 student samples in the following analysis.We compared the pass rates of the 0-1 BA group from 2018 on modules 2-5 with a propensity score matchedsubsample in 2017 who also had 0-1 BAs on the first two modules. The pass-rates for both sequences are shownin Fig. 4, while the p -values from Fisher’s exact test comparing each pair of data points on the figures is listed inthe first two rows of Table III. All p -values are adjusted for Type I error due to conducting multiple tests using theBenjamini-Hochberg method [29]. The data shows that there are significant performance differences in the successrate between the two student populations on Tutorial Pre and Example 1 Pre attempts in the RK sequence, whereasthe difference in the AM sequence is less prominent, possibly due to the success rate being very high in both samples.The differences are similar in nature but larger in magnitude compared to what was observed in our earlier studythat did not consider alternative learning strategies [9], suggesting that the earlier study could have underestimatedthe transfer ability of the student population.To examine the mechanism by which the on-ramp module improves the transfer of knowledge ( RQ3 ), we dividedthe 2018 0-1 BA population into three cohorts. Since the Fail cohort is much smaller than the other two cohorts andtoo small for reliable propensity score matching, we will only analyze the Pass On-Ramp Pre and Pass On-Ramp Postcohorts (see Table II). In Fig. 5, we compare the performance of those two cohorts to their propensity score matchedcounterparts in the Fall 2017 semester. The pass rates of the two cohorts on the same module sequence are shownside by side. Data from the RK sequence is shown on the top row (Fig. 5a and Fig. 5b) and the AM sequence in thebottom row (Fig. 5c and Fig. 5d). The adjusted p -values of Fisher’s exact test between each pair of points are listedin the last four rows of Table III.It can be seen from Fig. 5 that the Pass On-Ramp Pre cohort is responsible for the majority of the differences onPre-study attempts between the 2017 and 2018 samples. For the RK sequence, the differences are not statisticallysignificant for the Pass On-Ramp Post cohort after ( p -value adjustment [29]). For the AM sequence, none of thedifferences were statistically significant after p -value adjustment for either the Pass On-Ramp Pre or Pass On-RampPost cohorts. TABLE II. The number of students in each OLM sequence that fall into each on-ramp cohort among those with 0-1 BriefAttempts. The cohorts consist of those who passed during on-ramp Pre-study attempts (“Pass On-Ramp Pre”), those whopassed during on-ramp Post-study attempts (“Pass On-Ramp Post”), and those that failed the on-ramp assessment (“Fail”).Since the on-ramp module was only included in Fall 2018, only students from 2018 are included here.OLM Pass On-Ramp Pass On-Ramp FailSequence Pre PostRK 32 57 11AM 32 47 12
IV. DISCUSSIONA. Interpretation of results
We found that roughly half of the students frequently or consistently adopted a learning strategy that is likelymotivated by a performance-avoidance goal: making abnormally short attempts on their required first attempts onsome or all of the first four modules. These brief attempts could have been generated by students who were eitherguessing or copying the answer from a peer. While an occasional brief attempt may indicate a lack of confidence inone’s knowledge, continuous brief attempts on multiple modules are more likely a strategic choice to save time onthe task, since the performance differences on attempts after studying the learning material are much smaller. Thisstrategy fits well with Boekaerts’s description of students being in a “coping mode,” in which their goal is to pass themodule while saving time and “unnecessary” possible failures [18].For students who adopted the performance-avoidance strategy, their transfer ability can no longer be measuredusing OLMs, as their brief Pre-study attempts on the following modules do not always reflect their true ability totransfer their learning from the current module. Our analysis suggests that including data from those students resultedin an underestimation of students’ ability to transfer knowledge from the Tutorial module (module 2) to the Example1 module (module 3) in our earlier study, although most of the qualitative conclusions remain the same. However, itmust be clarified that the current analysis is also not an accurate measurement of the transfer ability for the entirestudent population. Instead, it is an accurate measurement for the subpopulation who did not frequently guess ontheir initial attempts.An alternative explanation of our observation is that students who frequently adopt the strategy have a lower levelof overall mastery on the subject (and possibly a higher level of self-awareness of their lack of knowledge). Therefore,they would not have been able to pass the required Pre stage attempt even if they had tried, and thus including thosestudents would not result in an underestimation of students’ transfer ability. However, while this may be true forsome students, we do not think that this explanation applies to the majority of students in the 2-3 BA and 4 BAgroups. This is because their performance on modules 2, 4, and 5, as well as on the Post stage of the Example 1module are either similar to that of the 0-1 BA group or only slightly worse, which suggests that their overall physicsabilities are similar and therefore the difference observed in the Pre stage attempts on the Example 1 module aremostly due to difference in strategical choice.Another major finding of the current analysis is that the benefit of the on-ramp module in facilitating transfer(as measured by Pre stage attempts of subsequent modules) predominantly occurs among students who can pass theon-ramp module before accessing the instructional component. The difference is much more prominent for the morechallenging RK sequence, and less so for the easier AM sequence. This unexpected observation holds true even afterwe used propensity score matching between the two semesters to control for the fact that the Pass On-Ramp Precohort likely includes students with better physics knowledge or higher motivation than students in the Pass On-RampPost cohort.A possible explanation could come from the basic principles of information processing theory [30, 31]. For studentswho already possess the essential skills or procedures, attempting the on-ramp module assessment prompted them toretrieve those skills from long-term memory and retain them in working memory. All or part of those skills remainedeither in the working memory or in a more active state when the students moved on to the subsequent modules, therebyfreeing up cognitive capacity for those students to better comprehend the additional complexity of the Tutorial andExample 1 modules. On the other hand, for those who had not yet mastered those essential skills, the IC of theon-ramp module was sufficient for them to pass the assessment, but not enough for them to achieve a higher level of
TABLE III. A list of p -values from Fisher’s exact test comparing the performance of 2018 students and matched 2017 studentson each common assessment in the listed figure. The p -values have been adjusted using the Benjamini-Hochberg method [29].Tutorial Tutorial Example 1 Example 1 QuizFig. Pre Post Pre Post4a 0.003 0.330 < . < .
001 0.0544b 0.333 0.265 0.166 0.306 0.1665a 0.001 1.000 0.001 0.395 0.4385b 0.498 1.000 0.008 0.028 0.0285c 0.764 0.766 0.267 1.000 0.7665d 0.835 0.835 0.835 0.835 0.835 ll ll ll ll l l ll
Assessment P e r c en t age o f S t uden t s w ho P a ss ll RK 2018 − 0−1 BA& Pass On−Ramp PreRK 2017 − Matched (a) Rotational Kinematics: Matching 2018 Pass On-Ramp Prestudents. ll ll ll ll l l ll
Assessment P e r c en t age o f S t uden t s w ho P a ss ll RK 2018 − 0−1 BA& Pass On−Ramp PostRK 2017 − Matched (b) Rotational Kinematics: Matching 2018 Pass On-Ramp Poststudents. ll ll ll ll l l ll
Assessment P e r c en t age o f S t uden t s w ho P a ss ll AM 2018 − 0−1 BA& Pass On−Ramp PreAM 2017 − Matched (c) Angular Momentum: Matching 2018 Pass On-Ramp Prestudents. ll ll ll ll l l ll
Assessment P e r c en t age o f S t uden t s w ho P a ss ll AM 2018 − 0−1 BA& Pass On−Ramp PostAM 2017 − Matched (d) Angular Momentum: Matching 2018 Pass On-Ramp Poststudents.
FIG. 5. Using propensity score matching on course exam scores, a subset of 2017 students are matched to 2018 students with0-1 Brief Attempts in either the (a) and (c) Pass On-Ramp Pre or (b) and (d) Pass On-Ramp Post cohorts. The pass ratesof these two cohorts are plotted separately for (a) and (b) Rotational Kinematics (RK) and (c) and (d) Angular Momentum(AM). proficiency. Therefore, activating those skills on the subsequent modules required a higher amount of cognitive load,limiting students’ abilities to process the additional complexities.A straightforward and testable implication of this explanation is that providing students with more practice problemson those essential skills will increase their ability to learn and transfer on subsequent modules. In addition, it maybe beneficial to distribute those practices rather than clustering them immediately prior to the tutorial sequence, asdistributed practice has been shown to be beneficial for skill acquisition and recall [32, 33], and practices of distributedretrieval of factual knowledge have been shown improved students’ physics exam scores [34].It must be pointed out that our use of propensity score matching to control for the fact that our selected studentpopulations likely have different knowledge and motivation than the rest of the population is far from perfect, sinceoverall exam scores may not fully reflect knowledge on the specific topic involved. A more accurate propensity scorecould be constructed in the future, when additional modules on the same topic are created and assigned to studentsprior to the tutorial sequence. Such modules have been created and administered in the Fall 2019 semester, enabling0more accurate analysis to be conducted in the future.
B. Implications for Online Education Research
Our analysis shows that students’ behaviors in a self-regulated online learning environment frequently deviate fromwhat was intended or expected by the instructor. Those unexpected behaviors, such as frequently guessing (or cheatingin some cases) on problems, can have a substantial impact on the outcomes of data analysis if not properly accountedfor. Excluding students with unexpected behavior improves the accuracy of the measurement, but also limits themeasurement to only those who interacted with online learning resources as expected. However, this should not be seenas a limitation that is unique to online education research, since students completing not for credit paper-and-pencilassessments can also adopt avoidance goal oriented strategies. In fact, the ability to detect the presence of diversestudent behavior, and correct for their impact in data analysis is a unique strength of online educational research. Itcan also motivate and facilitate future development of instructional strategies to reduce procedure-avoidance strategiesamong students in an online environment.Furthermore, in our earlier analysis [9] on the same module sequences, we found that instructional resources designedbased on well-documented learning science principles may not always generate expected outcomes due to variations inthe actual implementation. The current analysis further reveals that even when the instructional resource did resultin the expected outcome improvement, the underlying mechanism may be different from what was expected. In thiscase, modules that were designed to train the proficiency of essential skills among students actually benefited thosewho were already proficient and did not go through the training by serving as a reminder for them to activate theskills. Those results demonstrate the high level of complexity and unpredictability involved in designing and creatingeffective instructional resources. Moreover, they highlight the importance of discipline-based education researchers’role as “Education Engineers” who try to bridge the gap between learning theories and actual instructional practices.Last but not least, the current study is an exploratory attempt at evaluating the effectiveness of instructionalmaterials by comparing the outcomes of students enrolled in two consecutive semesters and controlling for the extrinsicvariances using propensity score matching. Compared to the more common method of conducting randomized ABexperiments [35, 36], the current method is significantly easier to implement in actual classroom settings and introducesfewer disruptions for students compared to randomized control experiments. In addition, this method allows for alarger sample size since each group consists of an entire class rather than a fraction of the class. While it introducesmore variances due to the treatment and control groups coming from different semesters, we demonstrated that theimpact from those variances could be controlled to some extent by methods such as propensity score matching. Thisless disruptive study setup can be particularly valuable under certain situations, such as during the current COVID-19outbreak which presents students with many obstacles as institutions shift to fully remote instruction, and instructorsare reluctant to introduce more potential sources of confusion.
ACKNOWLEDGMENTS
The authors would like to thank the Learning Systems and Technology team at UCF for developing the Obojoboplatform. Dr. Michelle Taub provided critical and insightful comments on students’ self-regulated learning. Thisresearch is partly supported by NSF Grants DUE-1845436 and DUE-1524575 and the Alfred P. Sloan FoundationGrant G-2018-11183. [1] J. D. Bransford and D. L. Schwartz, “Rethinking transfer: A simple proposal with multiple implications,” Review ofResearch in Education , 61–100 (1999).[2] H. S. Broudy, “Types of knowledge and purposes of education,” in Schooling and the Acquisition of Knowledge , edited byRichard C. Anderson, Rand J. Spiro, and William E. Montague (Routledge, 1977) pp. 1–17.[3] Douglas K. Detterman, “The case for the prosecution: Transfer as an epiphenomenon,” in
Transfer on Trial: Intelligence,Cognition, and Instruction , edited by D. K. Detterman and R. J. Sternberg (Ablex Publishing, 1993) pp. 1–24.[4] David Hestenes, Malcolm Wells, and Gregg Swackhamer, “Force concept inventory,” The Physics Teacher , 141–158(1992).[5] Ronald K. Thornton and David R. Sokoloff, “Assessing student learning of newtons laws: The force and motion conceptualevaluation and the evaluation of active learning laboratory and lecture curricula,” American Journal of Physics , 338–352(1998). [6] Andrew Pawl, Analia Barrantes, Carolin Cardamone, Saif Rayyan, and David E. Pritchard, “Development of a mechanicsreasoning inventory,” AIP Conference Proceedings , 287–290 (2012).[7] Jeffrey Marx and Karen Cummings, “Development of a survey instrument to gauge students’ problemsolving abilities,”AIP Conference Proceedings , 221–224 (2010).[8] Zhongzhou Chen, Kyle M. Whitcomb, and Chandralekha Singh, “Measuring the effectiveness of online problem-solvingtutorials by multi-level knowledge transfer,” in Physics Education Research Conference 2018 , PER Conference, edited byA. Traxler, Y. Cao, and S. Wolf (Physics Education Research Topical Group and the American Association of PhysicsTeachers, Washington, DC, 2018).[9] Zhongzhou Chen, Kyle M. Whitcomb, Matthew W. Guthrie, and Chandralekha Singh, “Evaluating the effectiveness oftwo methods to improve students’ problem solving performance after studying an online tutorial,” in
Physics EducationResearch Conference 2019 , PER Conference (Physics Education Research Topical Group and the American Associationof Physics Teachers, Provo, UT, 2019).[10] Brendon D. Mikula and Andrew F. Heckler, “Framework and implementation for improving physics essential skills viacomputer-based practice: Vector math,” Phys. Rev. Phys. Educ. Res. , 010122 (2017).[11] Nicholas T. Young and Andrew F. Heckler, “Observed hierarchy of student proficiency with period, frequency, and angularfrequency,” Phys. Rev. Phys. Educ. Res. , 010104 (2018).[12] Manu Kapur, “Productive failure in mathematical problem solving,” Instructional Science , 523–550 (2009).[13] Andrew J. Elliot and Kou Murayama, “On the measurement of achievement goals: Critique, illustration, and application.”Journal of Educational Psychology , 613–628 (2008).[14] Andrew J. Elliot and Holly A. McGregor, “A 2 × , 501–519 (2001).[15] Paul R. Pintrich, “The role of goal orientation in self-regulated learning,” in Handbook of Self-Regulation , edited by MoniqueBoekaerts, Paul R. Pintrich, and Moshe Zeidner (Academic Press, San Diego, 2000) pp. 451 – 502.[16] Paul R Pintrich, “A conceptual framework for assessing motivation and self-regulated learning in college students,” Edu-cational Psychology Review , 385–407 (2004).[17] Philip H Winne, “Self-regulated learning,” in International Encyclopedia of the Social & Behavioral Sciences , Vol. 21,edited by J D Wright (Elsevier, Oxford, UK, 2015) 2nd ed., pp. 535–540.[18] Monique Boekaerts and Markku Niemivirta, “Self-regulated learning: Finding a balance between learning goals and ego-protective goals,” in
Handbook of Self-Regulation , edited by Monique Boekaerts, Paul R. Pintrich, and Moshe Zeidner(Academic Press, San Diego, 2000) pp. 417 – 450.[19] Zhongzhou Chen, Geoffrey Garrido, Zachary Berry, Ian Turgeon, and Francisca Yonekura, “Designing online learningmodules to conduct pre- and post-testing at high frequency,” in
Physics Education Research Conference 2017 , PERConference (Physics Education Research Topical Group and the American Association of Physics Teachers, Cincinnati,OH, 2017) pp. 84–87.[20] Zhongzhou Chen, Sunbok Lee, and Geoffrey Garrido, “Re-designing the structure of online courses to empower educationaldata mining,” in
Proceedings of the 11th International Conference on Educational Data Mining, EDM 2018, Buffalo, NY,USA, July 15-18, 2018 (2018).[21] Zachary Berry, Ian Turgeon, and Francisca Yonekura,
Obojobo Next (2020).[22] Kyle M. Whitcomb, Matthew W. Guthrie, and Zhongzhou Chen,
Online tutorial sequences for 2017-2018 experiments (2020).[23] Seth DeVore, Emily Marshman, and Chandralekha Singh, “Challenge of engaging all students via self-paced interactiveelectronic learning tutorials for introductory physics,” Physical Review Physics Education Research , 010127 (2017).[24] Chandralekha Singh and Daniel Haileselassie, “Developing problem-solving skills of students taking introductory physicsvia web-based tutorials,” Journal of College Science Teaching , 42–49 (2010).[25] Zhongzhou Chen, Mengyu Xu, Geoffrey Garrido, and Matthew W. Guthrie, “Relationship between students’ onlinelearning behavior and course performance: What contextual information matters?” Phys. Rev. Phys. Educ. Res. ,010138 (2020).[26] R Core Team, R: A Language and Environment for Statistical Computing , R Foundation for Statistical Computing, Vienna,Austria (2019).[27] Daniel Ho, Kosuke Imai, Gary King, and Elizabeth Stuart, “Matchit: Nonparametric preprocessing for parametric causalinference,” Journal of Statistical Software, Articles , 1–28 (2011).[28] Hadley Wickham, tidyverse: Easily Install and Load the ‘Tidyverse’ (2017), R package version 1.2.1.[29] Yoav Benjamini and Yosef Hochberg, “Controlling the false discovery rate: a practical and powerful approach to multipletesting,” Journal of the Royal Statistical Society: Series B (Methodological) , 289–300 (1995).[30] John Sweller, Paul Ayres, and Slava Kalyuga, Cognitive Load Theory (Springer New York, 2011).[31] Herbert A. Simon, “Information-processing theory of human problem solving,” in
Handbook of learning & cognitive pro-cesses: V. Human information (Lawrence Erlbaum, 1978) Chap. 1, pp. 271–295.[32] John Dunlosky, Katherine A. Rawson, Elizabeth J. Marsh, Mitchell J. Nathan, and Daniel T. Willingham, “Improvingstudents learning with effective learning techniques: Promising directions from cognitive and educational psychology,”Psychological Science in the Public Interest , 4–58 (2013).[33] Charles Henderson, Jos´e P. Mestre, and Linda L. Slakey, “Cognitive science research can improve undergraduate steminstruction: What are the barriers?” Policy Insights from the Behavioral and Brain Sciences , 51–60 (2015).[34] Vegard Gjerde, Bodil Holst, and Stein Dankert Kolstø, “Retrieval practice of a hierarchical principle structure in universityintroductory physics: Making stronger students,” Physical Review Physics Education Research (2020), 10.1103/phys- revphyseducres.16.013103.[35] Zhongzhou Chen and Gary Gladding, “How to make a good animation: A grounded cognition model of how visualrepresentation design affects the construction of abstract physics knowledge,” Phys. Rev. ST Phys. Educ. Res. , 010111(2014).[36] Zhongzhou Chen, Christopher Chudzicki, Daniel Palumbo, Giora Alexandron, Youn-Jeng Choi, Qian Zhou, and David E.Pritchard, “Researching for better instructional methods using AB experiments in MOOCs: results and challenges,”Research and Practice in Technology Enhanced Learning11