Online administration of a reasoning inventory in development
Alexis Olsho, Suzanne White Brahmia, Trevor Smith, Charlotte Zimmerman, Andrew Boudreaux, Philip Eaton
OOnline administration of a reasoning inventory in development
Alexis Olsho, Suzanne White Brahmia, and Charlotte Zimmerman
Department of Physics, University of Washington, Box 351560, Seattle, WA 98195-1560, USA
Trevor I. Smith
Department of Physics & Astronomy and Department of STEAM Education,Rowan University, 201 Mullica Hill Rd., Glassboro, NJ 08028
Philip Eaton
Department of Physics, Montana State University, Bozeman, Montana 59717, USA
Andrew Boudreaux
Department of Physics & Astronomy, Western Washington University, 516 High St., Bellingham, WA 98225, USA
We are developing a new research based assessment (RBA) focused on quantitative reasoning—rather thanconceptual understanding—in physics contexts. We rapidly moved administration of the RBA online in Spring2020 due to the COVID-19 pandemic. We present our experiences with online, unproctored administration ofan RBA in development to students enrolled in a large-enrollment, calculus-based, introductory physics course.We describe our attempts to adhere to best practices on a limited time frame, and present a preliminary analysisof the results, comparing results from the online administration to earlier results from in-person, proctoredadministration. We include discussion of online administration of multiple-choice/multiple-response (MCMR)items, which we use on the instrument as a way to probe multiple facets of student reasoning. Our initialcomparison indicates little difference between online and paper administrations of the RBA, consistent withprevious work by other researchers. a r X i v : . [ phy s i c s . e d - ph ] J u l . INTRODUCTION Research-based assessments (RBAs) are now widely usedto examine student understanding and learning in physicsinstruction. In an ongoing, collaborative project, we aredeveloping a new RBA to assess students’ quantitative lit-eracy in introductory physics contexts. Development of avalid and reliable assessment requires iterative administra-tion and modification involving large numbers of students.For proper validation of individual items and the assessmentas a whole, these administrations should occur in controlledenvironments. Established best practices for RBAs includein-person (proctored) administration, either on paper or elec-tronically, for course credit but with responses not graded forcorrectness [1].RBAs can be time consuming to administer in class (bothin terms of class time and in processing the data collected), re-sulting in interest in online administration. Research suggeststhat online administration of RBAs originally designed to beadministered in person largely does not affect student perfor-mance [2–4]. However, some work suggests that online ad-ministration may result in reduced test security (with a smallpercentage of students copying/printing test items) and morefrequent “loss of focus”—that is, students may open otherbrowser windows while completing the assessment [3, 4]. Forcommonly used or well-known RBAs such as the Force Con-cept Inventory (FCI) or Brief Electricity and Magnetism As-sessment (BEMA), students may be able to find correct an-swers online, though this does not prevent comparisons withresults from in-person administration [4]. Decreases in stu-dent participation rates sometimes seen with online admin-istration can be ameliorated by frequent reminders from in-structors, and by following best practices similar to those forin-person administration, such as awarding credit based oncompletion rather than correctness. Issues of RBA securitymay be addressed by properly motivating the use of the RBAswith students, using a time limit, limiting the number of itemsstudents can view at one time, not giving students access tothe questions outside of the online form, and requiring thatstudents finish the survey once it has been started [2, 5].In this paper, we share our experiences related to the rapiddeployment of an online administration of an RBA in devel-opment, necessitated by the COVID-19 pandemic of 2020.In particular, we seek to answer two research questions:1) What differences in student performance/participation arethere, if any, that may be due to the difference in ad-ministration method, measured by looking at three differ-ent metrics; and 2) How does administration method af-fect students’ response patterns for mulitple-choice/mulitple-response (MCMR) questions?
II. RBA ADMINISTRATION METHODS
In this section, we describe the administration of our RBAunder development, both on paper (in-person) and online. We discuss the circumstances under which the the online versionof the RBA was administered, and our attempts to adhere tobest practices in a limited timeframe.
A. Background: In-person RBA administration
During the initial development of our RBA, we admin-istered it to all students enrolled in the 3-quarter, large-enrollment, calculus-based introductory physics sequence ata large public university in the Pacific Northwest. We ranversions of the RBA over eight academic quarters. It was ad-ministered at the beginning of the terms, before significantinstruction, thus serving as a “pretest" for each course of theintroductory sequence.Development of a valid and reliable instrument requiresregular access to a large number of students for a significantportion of instructional time. For most quarters, we were ableto administer the RBA to students during recitation sessions.These sessions are typically used for required small-group ac-tivities. As students are accustomed to attending the sessions,we were able to achieve a high participation rate. This alsoallowed us to proctor the assessment, consistent with best-practices [1].Proctoring the instrument administration was resource-intensive. The assessment was administered in over 50 recita-tions, each 50 minutes long, during the first week of instruc-tion. Because of the timing (during the first week of the quar-ter) preparing physics department TAs to proctor the assess-ment presented a significant challenge.For most in-person administrations of the assessment, stu-dents read items from a stapled packet and recorded theirresponses on a paper answer form as well as electronically.Our instrument includes several “multiple-choice/multiple re-sponse” (MCMR) items that ask students to select all an-swer choices they feel are appropriate. These items couldnot be handled by the University’s multiple-choice scoringmachines. Therefore, quarterly preparation for administra-tion of the instrument involved not only printing the itemsand answer forms but also creating online surveys into whichstudents could input their responses. Because of ongoingchanges to the assessment during the development period, thestapled packets and the online surveys could not be reused.Students were asked to enter their responses online using theirlaptop, smartphone, or tablet if possible. Students that did nothave or bring such a device with them to the class session—and so were unable to enter their answers online—were askedto indicate this on their paper answer form. After the instru-ment administration was finished for the quarter, a memberof the research team entered those responses manually. Be-tween 25 and 50 sets of responses were added manually eachquarter.Although we believe the methods described above resultedin high-quality data from a large number of students, they re-quired a significant investment of time and resources. More-over, some students misunderstood the instructions, leadingesearch team members to spend additional time making surethe data set was complete and that students were receiv-ing credit for their work. We began to consider online ad-ministration methods as an alternative, even exploring pur-chasing ∼ electronic tablets through a University-basedgrant. In this scheme, there would be no paper version of theinstrument—students would access the survey on the tabletsduring class, proctored by TAs or members of the researchteam. While this method of administration would still requiresignificant time and effort by research team members, we be-lieved it would be more straightforward for students than theprevious procedure of entering responses online after com-pleting the assessment on paper.Though our focus was on in-person, proctored administra-tion of the assessment, we began to consider whether online,unproctored administration would better support validationand wide-spread dissemination. While existing research sug-gests little or no significant difference in student performancebetween proctored and unproctored administrations of someRBAs [2–4], researchers recommend that online, unproctoredadministration be validated separately [2]. We wanted to de-termine whether our instrument could be administered onlineand unproctored by instructors who were reluctant or unableto allocate class time for administration. Moreover, thoughwe generally have access to students during the first week ofclasses during scheduled recitation sessions, scheduling wasdifficult during academic quarters in which instruction startedmidweek, leading to confusion and decreased participationrates. B. Online administration
The COVID-19 pandemic of early 2020 forced the is-sue. With the University moving to all online instruction, in-person administration of the assessment became impossible.Although online “proctoring” services exist [6], the proctor-ing requirements do not align well with University policiesregarding computer camera use during virtual instruction, anddo not take into account possible limitations on students dur-ing such an uncertain and difficult period.We ran the RBA unproctored and entirely online using theUniversity’s existing survey/quiz platform. To mitigate stu-dent stress during the rapid shift to online learning, the Uni-versity suggested that no graded work be required during thefirst week of instruction. Because we do not grade students’responses to the RBA for correctness, we decided to run theRBA, as usual, during the first week of the term in each of thethree courses of the calculus-based introductory physics se-quence. Because we were aware of the tendency of some stu-dents to place undue importance on such assessments, how-ever, we presented the RBA as a low-stakes survey.We adhered to best practices [3, 7] as much as possible: theRBA had a 50-minute time limit [8], equal to the usual classlength in which the instrument was administered (we note thatthis is longer than it should take for students to complete the instrument); multiple reminder emails were sent to studentsto increase participation rate; and course credit was offeredfor participation, but students’ responses were not graded for“correctness.” In addition, we constructed the online ver-sion of the instrument to discourage copying or saving of testitems: each item was shown in a browser window on its own;students were not able to backtrack in the RBA [9] and werenot shown a summary of their work or given the correct an-swers after completion. A video (less than 3 minutes long)embedded at the beginning of the RBA explained the purposeof the RBA and reiterated that the RBA was associated withcourse credit to be awarded on the basis of participation ratherthan the number of questions answered correctly. This is inline with best practices to discourage students from searchingfor answers to the items on the internet, while still motivatingstudents to give their best efforts on the assessment [2].Many online testing platforms will (automatically or by re-quest) randomly order each test item’s responses. We notethat this does not adhere to best practices—validation of indi-vidual items only holds for the versions used during the val-idation process [5]. Randomizing answer choices was there-fore not used to decrease cheating. Especially at the begin-ning of the academic quarter and with a majority of studentsgeographically separated due to the pandemic, we believedthat students were unlikely to attempt to collaborate with eachother when completing the assessment.Because we recognized that a majority of students com-pleting the survey for the first time would have little-to-noexperience with MCMR items, we made some changes tothe instrument to increase the likelihood that students wouldrecognize that they could select multiple responses for thoseitems. All of the MCMR items were moved to the end ofthe survey. After answering the last multiple-choice/single-response (MCSR) item, students saw a page with no instru-ment item, but rather a statement that the remaining questionson the survey might have more than one correct response, andthat students should choose all answers that they feel are cor-rect. At the top of the page for each of the remaining items(all MCMR), students saw a reminder that the question mighthave more than one correct response. We also prompted stu-dents to “choose all that apply" in the question stem.
III. COMPARISON OF ONLINE AND IN-PERSONADMINISTRATION: RESULTS AND DISCUSSION
To investigate differences in student performance, we de-cided to compare responses from an earlier, in-person admin-istration of the assessment to those from our online admin-istration. We chose to use data from the in-person versionof the RBA that was most similar to the online version. Ofthe 20 items on the assessment, only three were substantiallychanged between the two versions, allowing us to compareperformance on the remaining 17 items.We compare administration methods using the followingmetrics:. Participation rate.2. Student average score on the 17 items in common be-tween the two versions of the instrument.3. The 17 items’ classical test theory (CTT) difficulty statistics.In addition, we compared the percentage of students choos-ing more than one response on the instrument’s multiple-choice/multiple-response items between the two versions, togauge whether the online instructions for the MCMR itemswere sufficiently clear.The following sections present data collected in each of thethree courses of the calculus-based introductory physics se-quence. We refer to students in these three courses as “C(I),”“C(II),” and “C(III),” indicating the quarter of the instruc-tional sequence in which the students were enrolled whencompleting the assessment.
A. Participation rates
Overall participation rates were similar for in-person andonline administration. For in-person administration, the over-all participation rate was 91% (93%–92%–89% rate for C(I)–C(II)–C(III) students); for online administration, the overallparticipation rate was 90% (93%–89%–89% for C(I)–C(II)–C(III) students). For the online administration, we countedany attempt at completing the survey as participation. (Thisincluded a small number ( < ) of students who opened thesurvey but did not answer any of the items.)We attribute the high participation rates on the online ver-sion to the multiple reminder emails and course web pageannouncements about the assessment, as well as the assign-ment of course credit for participating in the assessment. Inaddition, as in previous quarters, the assessment was asso-ciated with the weekly small-group-work recitation sessions;students were told that the survey constituted the week’s workassociated with the recitation session. Finally, administrationof the survey during the first week, before other graded workwas due, may have boosted participation, as students were notyet overly burdened with assignments.Additionally, administering the assessment online allowedus to track the amount of time individual students took tocomplete it, which we were unable to do during previous in-person administrations. Although we cannot formally com-pare the time taken on the online version to that on the in-person versions, we do use the time data from the onlineadministration to address student “buy-in”—that is, whetheror not students seem to take the assessment seriously. Overall three courses, the average time spent on the survey was27.3 minutes (31.8–27.0–23.1 minutes for C(I)–C(II)–C(III)students, respectively) [10]. Classroom observations fromproctors during in-person administration suggest that studentstake about 40 minutes to complete the RBA in that setting.We believe the small (presumed) difference may be due to thesimpler test-taking process in the online context. When com-pleting the assessment online, students did not record their responses on paper and then enter them electronically afternavigating to a website; rather, they read and responded to theitems entirely online. Time spent navigating to the website ontheir computer, smartphone, or tablet is not included in theirtime. The time-on-task data are consistent with the amount oftime that we believe is necessary to read and respond to itemswith an appropriate amount of effort.We did notice a small number of students in each of thecourses taking ten minutes or less to complete the RBA: 5%overall (1%– 3%–11% for C(I)–C(II)–C(III) students). Tenminutes is likely not enough time to read and consider theanswer choices carefully, suggesting that these students maynot have been taking the assessment as seriously as we wouldlike. Fortunately, only for the C(III) students was the per-centage of students spending less than 10 minutes a sizablefraction of the student population. Because we ran the assess-ment in each quarter of the 2019-2020 academic year, manyof the C(III) students were seeing the assessment for the thirdtime; we would expect these students to spend less time onthe RBA due to familiarity with the material and assessmentitems. B. Overall student performance and item difficulty
In this section, we compare student performance on the twoadministrations of the assessment, denoted “Online” and “In-person”. We limit our analyses to the data collected from stu-dents enrolled in the first quarter of the calculus-based intro-ductory physics sequence (C(I) students). We believe this isthe best comparison, as these groups contain students seeingthe instrument for the first time. We compared student per-formance on the two versions of the instrument in two ways:using the average score for a subset of 17 items in commonbetween the two versions; and using changes in item diffi-culty for those 17 items.Average overall score and standard deviation on the subsetof 17 items for Online was . ± . ( N = 397 ); In-person,it was . ± . ( N = 326 ), a percent difference of about . While this difference is slightly larger than expectedfrom past quarters’ data, the effect is fairly small, with Co-hen’s d ≈ . .In addition to looking at students’ scores to compare per-formance for the two administrations, we calculated the Clas-sical Test Theory statistic item difficulty. The item difficulty isthe fraction of students answering each item correctly; there-fore, a higher difficulty value indicates an easier question.Comparing item difficulty for the 17 common items, wefound that while the average difficulty over all items in the setwas not significantly different, the individual difficulty wassignificantly different for five items (binomial test p < . ).A comparison of the item difficulties is shown in Fig. 1. Allfive of the items had lower difficulty values for the onlineversion of the instrument, indicating the items were more dif-ficult for students when presented online, consistent with thelower overall score described above. Four of the five of the .00.20.40.60.81.0 Q1 Q2 Q3 Q4 Q6 Q7 Q8 Q9 Q10 Q12 Q13 Q15 Q16 Q17 Q18 Q19 Q20
Item C TT D i ff i c u l t y Quarter
In−personOnline
FIG. 1. A comparison of CTT item difficulty for 17 items from theassessment for C(I) students. Red bars represent item difficulty onthe In-person administration of the assessment; blue bars are usedfor the Online administration. Error bars represent the standard er-ror. Dashed lines show the upper and lower bounds for desired itemdifficulty. items (Q15, Q18, Q19, and Q20 in Fig. 1) are MCMR items;we discuss a possible explanation for the difference in sec-tion III C below. We typically see large variations in the itemdifficulty for two of these items (Q15 and Q19), but the diffi-culties for those items during online administration are lowerthan expected from previous administrations.
C. Multiple-Choice/Multiple-Response items
Six of the 20 items on the instrument were multiple-choice/multiple-response (MCMR), asking students tochoose as many answers as they believed were correct foreach item. When the instrument was administered in person,there were multiple opportunities to remind students thatthey could choose more than one response on these items,both in writing on the instrument itself, and also verbally bythe proctor. Validation interviews suggested that multiplereminders were necessary, as this variety of question isrelatively rare on the assessments typically encountered bystudents. We were concerned that many students would notrecognize this type of question when encountering it online,especially students who had not completed the instrumentpreviously. As noted in Section II B above, we made severalchanges to the format of the assessment to emphasize tostudents that they should choose more than one response forthe MCMR items if appropriate.To assess the effectiveness of these measures, we comparedthe percentage of students choosing more than one responseon each MCMR item, finding an increase for all MCMR itemswhen administered online. We conclude that our measureswere effective. However, as only two of the MCMR itemson the RBA have more than one correct response, an increasein the number of answers chosen is not necessarily associ-ated in an improvement in performance. Increases in the number of responses selected is generally associated with adecrease in the correct response rate, as MCMR items werescored dichotomously (i.e., an MCMR item was only countedas correct if a student selected correct answer choice(s) anddid not select any of the incorrect choices). For items Q15,Q18, Q19, and Q20—the four MCMR items for which wesaw significant decreases in CTT item difficulty—the frac-tion of students who selected more than one answer choiceincreased by 9%, 22%, 9% and 16%, respectively, from theIn-person to the Online administration. Item 18 had two cor-rect responses; as with the other items, there was a decrease inthe item difficulty statistic and an increase in the percentageof students choosing more than one response.
IV. DISCUSSION AND FUTURE WORK
In this paper we describe preliminary work toward validat-ing an RBA in development for use with college-level intro-ductory physics in an online, unproctored environment. Ini-tial results tentatively suggest that students take the assess-ment seriously, perform at roughly the same level as for in-person administration, and are able to understand that MCMRitems allow for multiple responses. To continue toward avalid and reliable online assessment, we must learn moreabout how students interact with test items when using a com-puter or other internet-capable device, especially items forwhich there seems to be a significant difference in perfor-mance when administered online compared to on paper. Weplan to develop an online interview protocol that may helpus understand how student reasoning may change when theassessment is given in an online format.Although there were differences in item difficulty betweenthe two versions of the assessment discussed, we note thatmost items still fall within the desired range for difficulty forfirst-term students, as seen in Fig. 1. The data indicate thatthe bulk of the difference is due to students being more will-ing to choose multiple responses for MCMR items. As above,we need more information about how students interact andinterpret these types of questions in an online environment.Further analyses of particular answer choices on the MCMRitems, going beyond dichotomous scoring, may also provideinsight: for example, we are interested in changes in the per-centage of students choosing both correct and incorrect re-sponses for different administration methods. We would alsolike to investigate the effect of having the MCMR items inter-spersed with the MCSR items on the RBA as was done for theprior in-person administrations, rather than grouped togetherat the end.
ACKNOWLEDGMENTS
This work is supported by the National Science Foundationunder grants No. Grants No. DUE-1832836, DUE- 1832880,DUE-1833050, DGE-1762114.
1] A. Madsen, S. B. McKagan, and E. C. Sayre, Best practicesfor administering concept inventories, The Physics Teacher ,530 (2017).[2] S. Bonham, Reliability, compliance, and security in web-basedcourse assessments, Phys. Rev. ST Phys. Educ. Res. , 010106(2008).[3] J. M. Nissen, M. Jariwala, E. W. Close, and B. Van Dusen, Par-ticipation and performance on paper-and computer-based low-stakes assessments, International journal of STEM education , 21 (2018).[4] B. R. Wilcox and S. J. Pollock, Investigating students’ behav-ior and performance in online conceptual assessment, PhysicalReview Physics Education Research , 020145 (2019).[5] A. Madsen and S. McKagan, Administering research-based as-sessments online (physport expert recommendation), (Updated2020).[6] Many of these systems can be described more accurately as surveillance—they cannot interact with the students or providea physical presence.[7] B. R. Wilcox and H. Lewandowski, A summary of research-based assessment of students’ beliefs about the nature of exper-imental physics, American Journal of Physics86