Investigating students' behavior and performance in online conceptual assessment
aa r X i v : . [ phy s i c s . e d - ph ] A ug Investigating students’ behavior and performance in online conceptual assessment
Bethany R. Wilcox and Steven J. Pollock Department of Physics, University of Colorado, 390 UCB, Boulder, CO 80309
Historically, the implementation of research-based assessments (RBAs) has been a driver of edu-cational change within physics and helped motivate adoption of interactive engagement pedagogies.Until recently, RBAs were given to students exclusively on paper and in-class; however, this approachhas important drawbacks including decentralized data collection and the need to sacrifice class time.Recently, some RBAs have been moved to online platforms to address these limitations. Yet, onlineRBAs present new concerns such as student participation rates, test security, and students’ use ofoutside resources. Here, we report on a study addressing these concerns in both upper-division andlower-division undergraduate physics courses. We gave RBAs to courses at five institutions; theRBAs were hosted online and featured embedded JavaScript code which collected information onstudents’ behaviors (e.g., copying text, printing). With these data, we examine the prevalence ofthese behaviors, and their correlation with students’ scores, to determine if online and paper-basedRBAs are comparable. We find that browser loss of focus is the most common online behavior whilecopying and printing events were rarer. We found that correlations between these behaviors andstudent performance varied significantly between introductory and upper-division student popula-tions, particularly with respect to the impact of students copying text in order to utilize internetresources. However, while the majority of students engaged in one or more of the targeted onlinebehaviors, we found that, for our sample, none of these behaviors resulted in a significant change inthe population’s average performance that would threaten our ability to interpret this performanceor compare it to paper-based implementations of the RBA.
I. INTRODUCTION AND MOTIVATION
Research-based assessments (RBAs) have become acornerstone of physics education research (PER) due inlarge part to their ability to provide a standardized mea-sure of students’ learning that can be compared acrossdifferent learning environments or curricula [1]. As such,these assessments are a critical step along the path to-wards making evidenced-based decisions with respect toteaching and student learning. RBAs have historicallybeen a strong driver in promoting the need for, and adop-tion of, educational reforms in undergraduate physicscourses (e.g., [2–4]). It can be argued that, without theinvention and consistent use of RBAs, the PER commu-nity might not have the same focus on active learningand interactive engagement that is does today.However, despite their value, there are a numberof barriers to wide-scale implementation of RBAs thatstand in the way of their integration into physics depart-ments [5, 6]. For example, most of the existing RBAsrequire that an instructor sacrifices 1-2 full class periodsto administering the RBA pre- and post-instruction. Formany instructors feeling pressure to cover as much con-tent as possible over the course of a semester, this sac-rifice is difficult to justify. In addition to the demandfor class time, instructors must also sacrifice valuabletime outside of class to analyze their students’ perfor-mance. Many instructors are not experts in assessmentand struggle with analysis and interpretation of their stu-dents’ scores. This can make faculty particularly reluc-tant to sacrifice class time to an assessment that they areultimately unable to identify actionable results from.Recently, physics education researchers have at-tempted to address both of these challenges by shifting RBAs to online platforms (e.g., [5, 7–9]). Hosting theRBAs online allows instructors to assign the RBA for stu-dent to complete outside of class, freeing them from theneed to sacrifice class time. Moreover, the online plat-form allows for easy standardization and centralizationof the data collection and analysis process. This has twomajor advantage for the instructor. By automating theanalysis of students’ responses, these centralized systemsmake is so that the instructor no longer needs to performthis analysis themselves. Moreover, centralizing data col-lection ensures the aggregation of comparison data thatcan be used to facilitate meaningful comparisons and canhelp instructors, and researchers, to identify actionableimplications from their students’ performance. However,while these online systems have a lot of potential for en-couraging more wide-spread use of RBAs by removingbarriers to their use, these systems bring with them annumber of other concerns, particularly around the po-tential for reduced participation rates, students’ use ofoutside resources, potential for distraction and breachesof test security [6].Here, we build on prior work to investigate the ex-tent to which these concerns factor into students’ scoreswhen completing standardized physics conceptual assess-ments online. We include data from both introductoryand upper-level contexts as there are significant differ-ences between the student populations at these two lev-els, which could have implications for how students en-gage with an online assessment. In the next section(Sec. II), we describe prior work around online conceptualassessments. We then discuss the context and methodsused in this study (Sec. III), and present our findingswith respect to students’ online behaviors when takingthe RBAs as well as how these behaviors correlate withtheir overall performance (Sec. IV). Finally, we end witha discussion of our conclusions, limitations, and implica-tions of the study (Sec. V).
II. BACKGROUND
Significant prior work has been done to address someof the concerns around online conceptual assessment aspart of the Learning Assistant Student Supported Out-comes (LASSO) study. Specifically, they have investi-gated concerns about changes in scores and participationrates between online and paper-based administrations ofthe Force Concept Inventory (FCI) [10] and ConceptualSurvey of Electricity and Magnetism (CSEM) [11] in thecontext of introductory courses. They found that, whenlooking at all courses in aggregate, participation ratestended to be lower for online RBAs [12]. However, thisdifference between the two formats vanished when bestpractices were used for the online implementations. Thebest practices identified in the LASSO study include:multiple email and in-class reminders, and offering par-ticipation credit for completing both the pre- and post-tests. Moreover, they also found that, when participationrates were similar, students’ overall performance was alsostatistically comparable [13].Historically, multiple researchers have address the issueof the comparability of online assessments both withinand outside of PER. For example, MacIsaac et. al [6]also found no difference in students scores on the FCIbetween web-based and paper-based administrations. Inaddition to investigating overall score, they also saw nodifference in performance on individual items and no dif-ference based on students’ gender. However, while stud-ies withing PER have consistently indicated that there isno difference in performance between online and paper-based RBAs, the results from outside PER are morevaried. Many studies have documented no statisticallysignificant difference between students performance ononline multiple-choice tests (e.g., [14, 15]), while othershave reported cases where online tests scored statisti-cally higher or lower than associated paper-based tests(see Refs. [15, 16] for reviews). The variation in thesestudies has led to the recommendation that, while onlineand paper-based tests can be equivalent, it should not be assumed that they are equivalent until it has been clearlydemonstrated that they actually are [16].A smaller body of work has focused specifically on in-vestigating the validity of concerns about students’ useof outside resources (e.g., the internet or other students)or breaches in test security. For example, Haney andClark [17] collected timing and path data from studentswho took a series of online quizzes over the course of aone semester course. By analyzing patterns in studentsresponses (e.g., similarities in two students response pat-terns combined with close timing of the two submissions)to identify likely cases of students collaborating on theassignments. They found that this type of collaboration increased as the semester went on and students adaptedto the online quiz format. They also asked students toself-report whether they collaborated with others for thequizzes and found students reported collaboration withsimilar frequencies to what was detected in the responsepatterns.Another study conducted in the context of an intro-ductory astronomy course looked at different online stu-dent behaviors when taking an online conceptual assess-ment [18]. In this study, Bonham [18] used JavaScriptsand other applets to detect when students engaged inbehaviors like printing browser pages, coping or high-lighted text, and switching in to other browser windowswhile taking an online astronomy concept assessment.They found no instances of students printing pages, andonly 6 cases (out of 559) they deemed were probableincidence of students copying text. Students switchingbrowser windows was more common; however, Bonhamargued these events appeared random and were not sys-tematically associated with particular questions. Therewere several important limitations to Bonham’s study. Inbrowsers other than Microsoft Explorer, copy events andsave events were detected through the proxies of high-lighting text and page reloads respectively. As Bonhamnoted, highlighting text as a proxy for copying resultsin many false positives, and there was no discussion ofhow these behaviors related to performance on the RBA.Here, we replicate and extend the study by Bonham inthe context of physics courses at both the upper-divisionand introductory levels.
III. CONTEXT & METHODS
Four different physics RBAs were used in this study- two upper-division and two introductory. The twoupper-division RBAs used in this study were the Quan-tum Mechanics Conceptual Assessment (QMCA) [19]and the Colorado Upper-division Electrostatics Diag-nostic (CUE) [20]. Both the QMCA and CUE aremultiple-choice or multiple-response assessments target-ing content from the first semester in a two-semester se-quence in junior-level Quantum Mechanics, and Electric-ity and Magnetism respectively. The two introductoryassessments used were the Force and Motion ConceptualEvaluation (FMCE) [21] and the Brief Electricity andMagnetism Assessment (BEMA) [22]. The FMCE andBEMA are both multiple-choice assessments targetingcontent from the first and second semester, respectively,of a two-semester, calculus-based introductory physicscourse. All four RBAs were administered online, usingthe survey platform Qualtrics, during the final week ofthe regular semester. The online versions of the RBAswere designed to mirror the paper versions as faithfullyas possible. For example, each separate page on the pa-per versions was offered as a separate browser page inthe online version. Students could also navigate freelyboth forward and backward within the assessment, as
TABLE I. Overall participation rates for the introductory andupper-division populations. Participation rates (“Rate”) rep-resent the percentage of the total course enrollment for whichwe collected valid responses ( N valid ) in the final data set. In-dividual course participation rates varied from 70% to 100%. N valid N roster RateIntroductory 1287 1543 83.4%Upper-division 308 336 90.2% they would be able to with a paper exam packet.Student responses were collected from 2 introductorycourses and 10 upper-division courses at eight insti-tutions. All eight institutions are four-year universi-ties spanning a range of types including three doctoral-granting institutions classified as very high research, twomasters-granting institution (one classified as Hispanic-serving), and three bachelors-granting institutions. Theauthors taught two of the upper-division courses, andthe remaining instructors volunteered. In all cases, theinstructors offered regular course credit to their studentsfor simply completing the RBA (independent of perfor-mance). In most cases, students received multiple in-class reminders to complete the assessment. After elim-ination of responses that were marked as invalid (e.g.,due to too many blanks; less than 3% of non-dublicateresponses were identified as invalid), participation ratesby course varied from 70% to 100% over all 12 courses.Since the goal of this study is not to contrast courses atthe same level, the remainder of the analysis will aggre-gate all introductory and upper-division students sepa-rately.The breakdown of overall participation rates is given inTable I. These rates are somewhat higher than what hasbeen observed for post-test participation in either paper-based or online RBAs in previous studies. For example,the LASSO study found for their introductory popula-tion an average post-test response rate of 66% for paper-based RBAs and 50% for online RBAs [12]. We also havehistorical participation rates available for three of theupper-division courses and both of the two introductorycourses in the data set. Average historical participationrates for these courses were between 60-85% for both theupper-division courses and introductory courses. Thesesame courses saw participation rates between 79-97% inthe current data set, suggesting that the participationrate for these courses actually increased somewhat whenthe RBA was given online. The fact that the instructorsin our data set offered a meaningful amount of regularcourse credit for students who participated in the RBA,independent of their performance, likely contributed sig-nificantly to this increase in participation. We do nothave consistent access to data on the racial or genderdistributions for the students in our data set and thus donot report this breakdown here.On the first page of the assessment, students wereinstructed to complete the RBA in one sitting without the use of outside resources such as notes, textbooks, orGoogle. To capture students’ online behaviors, we em-bedded JavaScript code into the online prompts to lookfor instances of students copying text, printing from theirbrowser, or clicking into another browser window. Forthe upper-division population, the code only recordedwhen a student copied text, but did not record whattext was copied. However, for the introductory popula-tion, we also collected data on what question the textwas copied from. In all cases, these behaviors were timestamped to determine when each action occurred andhow many times each student exhibited that behavior.This JavaScript code could only detect activities thathappen at the browser level; activities at the computerlevel (e.g., taking a screenshot or clicking into anotherprogram) were not recorded by the code. While suchdata would be useful, modern browsers nearly all have se-curity features to prevent cookies and scripts in browsersfrom collecting information on activities happening out-side the current browser window.For browser print commands (e.g., “control-p”) andcopy text commands (e.g., “control-c”), the primary datacollected were when and how often these commands wereissued. Data on browser focus were somewhat more com-plex. The code was designed to listen for a change inbrowser focus, and then record whether the RBA tabwas visible 4 seconds after the browser focus event oc-curred. This allows for a variety of patterns in focus dataas students click in and out of the browser tab, whichthey sometimes did rapidly and repeatedly. However, ingeneral, if a student clicks into a new browser tab andstays in that tab for more than 4 seconds, the code wouldrecord a single browser focus event and tag it “hidden.”A “hidden” browser focus event most often means thatthe student left the RBA without returning to it within 4seconds. Alternatively, if the student clicked into anotherbrowser tab and then clicked back into the RBA within4 seconds (and remained there for more than 4 seconds),the code would record two browser focus events – one forthe click out and one for the click in – and would tag bothas “visible.” A single “visible” browser focus event mostoften means that the student returned to the RBA formore than 4 seconds after having left it for any amountof time.In addition to the data on students’ online behaviors,we collected students’ scores on the assessment in order tocompare the prevalence of students behaviors with theirperformance on the RBA. Total time spent on the assess-ment was approximated using the time elapsed betweenthe time the student clicked on the link for the RBA andthe time they submitted it. As we discuss in more de-tail in the next section, this duration is only approximatebecause it does not account for time the student mighthave spent away from the RBA (e.g., in another browsertab, or not on their computer).
IV. RESULTS
Here, we examine data on students behaviors on onlineRBAs to determine how prevalent specific online behav-iors were for this population of students. We also exam-ine correlations between these behaviors and students’performance on the assessments overall.
A. Print Events
The primary concern associated with students print-ing or saving RBAs is that these students might publiclypost the assessment and thus breach the security of theassessment by making it available to other students. Be-cause the online RBAs were designed to mirror the paper-based versions, each had 10-15 individual pages that thestudents would work through to see all questions. Thismeans that to present a significant threat to the secu-rity of the assessment, a student would need to printeach page of the assessment separately, and in so doing,would register multiple print commands. To determinethe prevalence of students printing their browser page,we include responses from the full data set, including re-sponses marked as “invalid” from, for example, studentswho did not ultimately submit the RBA. In this full dataset of 1879 student responses, only five (two from the in-troductory population and three from the upper-division)had recorded print events. Of these, 3 students, all fromthe upper-division population, had multiple print com-mands consistent with having saved all or the majorityof the assessment pages. The remaining 2, both fromthe introductory population, had only 1-2 distinct printevents meaning they could, at most, have saved only asmall number of questions. It could be that after begin-ning the process of saving the questions, these studentsrealized the process would require saving each page of theassessment individually and gave up.Print commands themselves do not necessarily indicatea student who is intending to breach the security of theassessment. In fact, one of the instructors (SJP) reportedinteracting with a student during help hours in which thestudents pulled up screenshots of the assessment whichhe had taken to study from after the fact. The studentmade no attempt to hide the screenshots and was up-front with his motivation for taking the screenshots as astudy tool. Moreover, even if a student did post the RBAprompts online, without corresponding solutions, whichwere never released to the students, it is not clear thataccess to the RBA prompts alone actually represents asignificant threat to the assessment’s security or validity.Additionally, as is standard for paper-based assessments,the formal names or acronyms for the assessments (e.g.,the CUE) were not provided to students in the onlineversions.To test for any immediate security breaches of the as-sessments, we Googled the prompts for each question onall four RBAs used in this study several weeks after the assessments had closed. The results of these searches var-ied significantly for the introductory and upper-divisionassessments. For the two upper-division assessments (theCUE and QMCA), there was no indication that theitem prompts or their solutions had been uploaded ina way that ranked high in Google’s listing. However,as Google’s algorithm can change based on search pat-terns, it is likely necessary to do this type of verificationperiodically to ensure no solutions have surfaced. In sev-eral cases, Googling the item prompts pulled up PERpublications on the test itself, and some of these publi-cations included supplemental material which containedthe grading rubrics for the assessment in one form or an-other (open-ended or multiple-choice). It is worth notingthat in all cases, these rubrics were buried at the end of along publication or thesis and not clearly marked, and itis not clear if a student who was unfamiliar with the spe-cific publications (or the nature of academic publicationmore generally) would be able to locate the rubrics with-out considerable persistence. However, this suggests thatthe greatest threat to the security of the upper-divisionRBAs in an online format may actually be our own pub-lications combined with the fact that the premier PERpublication venue is open access.Attempts to Google the prompts to the two introduc-tory RBAs (the FMCE and BEMA), however, yieldedvery different results. Searching prompts for items onthese assessments pulls up images of the exact promptsfrom the assessment, and accompanying solutions areavailable on paid solution sites like Chegg or Course Hero.Any student with an existing subscription to these siteswould like be able to find solutions to the FMCE orBEMA questions with relative ease. These solutions pre-date this study, and thus represent breaches of securitythat occurred previously. The larger online presence ofboth the introductory RBA prompts and solutions has atleast two possible contributing factors. First, introduc-tory (and largely non-physics major) students may bemore likely to engage in behaviors that facilitate quickcompletion of online assignments rather than prioritizingdeep learning of the material. Thus, they may be morelikely to look for, and share, course materials online. Sec-ond, both the FMCE and BEMA are considerably olderand more extensively used than the CUE and QMCA. Itmay be that solutions to any RBA will eventually maketheir way online given sufficient time and use, and thatthe CUE and QMCA are not old enough or commonenough to have achieved a significant online presences.We will discuss additional implications of this pattern inSec. IV C.
B. Browser Focus Events
Online RBAs introduce a potential for students to be-come disengaged from the assessment in a way that is lesslikely in paper-based administrations. Loss of browser fo-cus is one proxy for students disengaging from the RBA.
TABLE II. Duration and number of sustained browser hidden events in the introductory and upper-division student population.For reference, the total number of valid responses in the introductory and upper-division data sets was N = 1287 and N = 308respectively. Introductory Upper-divisionTotal number of students with 1 or more focus event 562 159Number of focus events per student Median - 2 (Max - 43) Median - 2 (Max - 59)Number of students with only 1 focus event 219 66Number of students with 10 or more focus events 91 20Total number of focus events 2860 725Duration of focus events Median - 21 sec (Max - 66.7 hrs) Median - 34 sec (Max - 29.3 hrs)Number of focus events less than 1 min 2264 479Number of focus events greater than 5 min 149 70 Focus events were the most common events in the dataset with roughly half of the students (46%, N = 562 of1287 in introductory; 52%, N = 159 of 308 in upper-division) with at least one browser focus event in whichtheir RBA window became hidden for more than 4 sec-onds. For these students, we examined trends in thenumber and duration of browser focus events by groupingthem to isolate sustained changes in browser visibility. Inother words, if a students’ survey page becomes hidden,how long before it becomes visible again, independent ofwhether there are additional browser hidden events in be-tween (indicating that the student clicked back into thesurvey window, but did not remain there for more than4 seconds)? Here, we will report median and max dura-tion, as the presence of even a small number of outliersmakes the average less meaningful. Table II reports in-formation on the number and duration of browser focusevents in the dataset. While Table II reports data forthe introductory and upper-division students separately,the trends are comparable between the two levels. Thesetrends suggest that a large fraction of students in the dataset did click out of the assessment tab one or more timeswhile taking the RBA; however, roughly two-thirds of thetime they were away from the RBA for no more than 1minute and less than 10% left the assessment for longerthan 5 minutes. Moreover, just over a third of studentsleft the assessment only once. In our experience imple-menting assessments like these in in-class environments,this frequency and time-frame is generally comparable tohow long a student might “space out” while taking theRBA during class.We also examined whether the appearance or durationof loss of focus events correlated with students’ scoreson the assessment. In as much as browser loss of focuscould be a proxy for distraction, it might be guessed thatstudents with loss of focus events would score lower thanothers on the RBA. Alternatively, if the loss of focus isassociated with use of internet resources (see Sec. IV C),we would anticipate students with loss of focus eventsto potentially score higher. To account for difference inaverage score between courses in the data set, z-scorescalculated relative to the average score for each indi- vidual class were used in calculating correlations. Stu-dents with loss of focus events scored higher on averageby roughly a quarter of a standard deviation than otherstudent for the introductory RBAs (i.e., a z-score differ-ence of 0 .
26) and lower on average by roughly a fifth ofa standard deviation for the upper-division RBAs (i.e.,a z-score difference of − . p = 0 . d = 0 . r = 0 . p = 0 . r = − . p = 0 . C. Copy Events
The primary concern associated with students copyingtext from an online RBA is that students may do so inorder to search the internet in an attempt to “look up”the correct answer. Table III shows the prevalence ofcopy events within our data set, showing that roughly atenth of the student in the data set had one or more copyevents. A copy event, on its own, does not necessarilymean that the student was attempting to web search an-swers to the questions. However, if a student copies textwith the intention of searching the web for that text, thisbehavior would most likely be characterized by a copyevent followed immediately by a sustained browser hid-den focus event. To investigate this, we looked for copyevents followed within 5 seconds by a sustained browserloss of focus event and counted how many times this oc-curred for each student. We found that more than three
TABLE III. Number of copy events detected in the introduc-tory and upper-division populations. For reference, the to-tal number of valid responses in the introductory and upper-division data sets was N = 1287 and N = 308 respectively.Introductory Upper-divisionNumber of students with 147 22copy eventsMedian number of copy 4 2events per studentMax number of copy events 54 11per studentTotal number of copy events 861 67 quarters of the copy events ( N = 654 of 861 events for in-troductory, and N = 56 of 67 events for upper-division)fell into this category. This indicates that a majorityof copy events were immediately followed by the stu-dent switching into a new browser window and remainingthere for more than 4 seconds, consistent with the pat-tern we would expect if they were trying to web searchthe item prompts. The remaining copy events that werenot followed by a loss of focus event were typically char-acterized by either the first of two quick consecutive copyevents followed by a single loss of focus event, or singlecopy events not connected temporally with a loss of focusevent.Given this pattern, we also examined whether thestudents with copy events had any difference in perfor-mance from other students. For the introductory RBAs,students with copy events scored higher than studentswithout copy events (average z-score difference of 0 . − . p < . | d | = 0 . x p << . N = 147 introductory studentswho had one or more copy events. We then calculatedz-scores for their performance on the subset of questionswhere they copied text and z-scores for their performanceon the subset of questions where they did not copy text.We then average the two resulting scores across all stu-dents to determine whether students perform better onaverage on questions where they copied text relative toquestions where they did not. We found that the z-scoreon the subset of copied questions was higher on averageby just under half a standard deviation (i.e., a z-scoredifference of 0.44). This difference is statistically signif-icant (Mann-Whitney U p << . d = 0 . TABLE IV. Contingency table breaking down how often stu-dents’ responded to a question correctly relative to whetherthey had copied text from that question. This table includesdata from all questions; thus, each count in the table repre-sents a response from one student to one question.Copied text Did not copy textCorrect response 559 28,291Incorrect response 163 20,596 copy events. We examine this both for the total scoreand the scores for individual items.Removing all students who had copy events from theintroductory data set resulted in a drop in overall av-erage score of roughly 1%. This difference represents avery small effect (Cohen’s d = 0 .
05) and was not statis-tically significant (Mann-Whitney U p = 0 . D. Time to Completion
We also examine the total amount of time to com-pletion for each student to determine whether student’sscores are related to how long it took them to completethe assessment. Total time data are calculated by com-paring the recorded time when the student first openedthe survey link to when they made their final submis-sion of the survey. This does not remove periods whenbrowser focus was lost, and can even include a periodwhen the survey window was closed and later reopened.As such, these duration do not necessarily reflect theamount of time a student actually worked on the assess-ment, merely the amount of time that passed betweenthem opening and submitting the assessment. For thevast majority of students (65%, N = 843 of 1287 in in-troductory; 78%, N = 239 of 308 in upper-division), thetotal time between start and submit fell within a timeframe of 15-60 min, consistent with what would be re-quired of a student taking the RBA in class. Total timespent on the RBA showed a significant (though small)correlation with z-score on the assessment only for theintroductory students (Spearman r = 0 . p << . V. DISCUSSION & LIMITATIONS
We collected online responses to four research-basedassessments spanning both introductory and upper-division content. This work is part of ongoing researchto determine whether students’ performance on RBAsshifts when these assessments are given online. For three of the courses in the data set, we also have historicalscores from students in these same classes with the sameinstructor where the RBA was given on paper and dur-ing class. Comparisons of the online and in-class scoresshowed the online scores being roughly 5% lower. Thisdifference was statistically significant only in the case ofthe introductory population (two-tailed t-test, p = 0 . d = 0 . ACKNOWLEDGMENTS
This work was funded by the CU Physics Department.Special thank you to the faculty and students who par-ticipated in the study and the members of PER@C forall their feedback. [1] Adrian Madsen, Sarah B McKagan, and Eleanor CSayre, “Resource letter rbai-1: research-based assessmentinstruments in physics and astronomy,” American Jour-nal of Physics , 245–264 (2017).[2] Robert J Beichner, Jeffery M Saul, David S Abbott,Jeanne J Morse, Duane Deardorff, Rhett J Allain,Scott W Bonham, Melissa H Dancy, and John S Ris-ley, “The student-centered activities for large enrollmentundergraduate programs (scale-up) project,” Research-based reform of university physics , 2–39 (2007). [3] David Hestenes, “Toward a modeling theory of physicsinstruction,” American journal of physics , 440–454(1987).[4] Catherine H Crouch and Eric Mazur, “Peer instruction:Ten years of experience and results,” American journalof physics , 970–977 (2001).[5] Bethany R. Wilcox, Benjamin M. Zwickl, Robert D.Hobbs, John M. Aiken, Nathan M. Welch, andH. J. Lewandowski, “Alternative model for adminis-tration and analysis of research-based assessments,” Phys. Rev. Phys. Educ. Res. , 010139 (2016).[6] Dan MacIsaac, Rebecca Pollard Cole, David M Cole,Laura McCullough, and Jim Maxka, “Standardized test-ing in physics via the world wide web,” Electronic Journalof Science Education (2002).[7] Ben Van Dusen, Laurie Langdon, and Valerie Otero,“Learning assistant supported student outcomes (lasso)study initial findings,” in Physics Education ResearchConference 2015 , 010135 (2019).[10] David Hestenes, Malcolm Wells, and Gregg Swackhamer,“Force concept inventory,” The physics teacher , 141–158 (1992).[11] David P Maloney, Thomas L OKuma, Curtis J Hieggelke,and Alan Van Heuvelen, “Surveying students conceptualknowledge of electricity and magnetism,” American Jour-nal of Physics , S12–S23 (2001).[12] Manher Jariwala, Jayson Nissen, Xochith Herrera,Eleanor Close, and Ben Van Dusen, “Participationrates of in-class vs. online administration of low-stakesresearch-based assessments,” in Physics Education Re-search Conference 2017 , PER Conference (Cincinnati,OH, 2017) pp. 196–199.[13] Jayson Nissen, Manher Jariwala, Xochith Herrera,Eleanor Close, and Ben Van Dusen, “Performance on in-class vs. online administration of concept inventories,” in