Developing a reasoning inventory for measuring physics quantitative literacy
Trevor I. Smith, Suzanne W. Brahmia, Alexis Olsho, Andrew Boudreaux, Philip Eaton, Paul J. Kelly, Kyle J. Louis, Mitchell A. Nussenbaum, Louis J. Remy
DDeveloping a reasoning inventory for measuring physics quantitative literacy
Trevor I. Smith,
1, 2
Suzanne W. Brahmia, Alexis Olsho, Andrew Boudreaux, PhilipEaton, Paul J. Kelly, Kyle J. Louis,
1, 2
Mitchell A. Nussenbaum, and Louis J. Remy Department of Physics & Astronomy, Rowan University, 201 Mullica Hill Rd., Glassboro, NJ 08028, USA Department of STEAM Education, Rowan University, 201 Mullica Hill Rd., Glassboro, NJ 08028, USA Department of Physics, University of Washington, Box 351560, Seattle, WA 98195-1560, USA Department of Physics and Astronomy, Western Washington University, 516 High St, Bellingham, WA 98225, USA Department of Physics, Montana State University, Bozeman, Montana 59717, USA
In an effort to improve the quality of citizen engagement in workplace, politics, and other domains in whichquantitative reasoning plays an important role, Quantitative Literacy (QL) has become the focus of considerableresearch and development efforts in mathematics education. QL is characterized by sophisticated reasoning withelementary mathematics. In this project, we extend the notions of QL to include the physics domain and call itPhysics Quantitative Literacy (PQL). We report on early stage development from a collaboration that focuseson reasoning inventory design and data analysis methodology for measuring the development of PQL across theintroductory physics sequence. We have piloted a prototype assessment designed to measure students’ PQL inintroductory physics: Physics Inventory of Quantitative Literacy (PIQL). This prototype PIQL focuses on twocomponents of PQL: proportional reasoning, and reasoning with negative quantities. We present preliminaryresults from approximately 1,000 undergraduate and 20 graduate students.
I. INTRODUCTION
The development of mathematical reasoning skills is animportant goal in many introductory physics courses, par-ticularly those geared toward students majoring in physicsand other physical science and engineering fields. Previousresearch has shown that students’ development of PhysicsQuantitative Literacy (PQL)—the ability to reason mathemat-ically in the context of physics—is often less than desired [1];however, few studies have rigorously examined the develop-ment of PQL over time or how this might vary across studentpopulations. We have begun developing the Physics Inven-tory of Quantitative Literacy (PIQL) to address the need fora valid and reliable assessment instrument for measuring stu-dents’ PQL across the undergraduate physics curriculum.Enhancing PQL has the potential to strengthen students’knowledge of mathematics [2, 3], better prepare them forfuture demands to think quantitatively [4], and promote in-creased equity and inclusion in physics instruction [5, 6]. Atits core, quantitative literacy (QL) involves blended use ofmathematical concepts and procedures. Both everyday sense-making and workplace performance rely on QL, and physicsis ideally positioned to help students develop these skills. Wehave developed an 18-question prototype PIQL that focuseson two specific elements of PQL: reasoning with ratios andproportions, and reasoning about negative quantities.The use of ratios and proportions to describe systems andcharacterize phenomena is a hallmark of expertise in STEMfields, perhaps especially in physics. Boudreaux, Kanim,and Brahmia identify a set of reasoning subskills to providea more fine-grained analysis of proportional reasoning, andthey isolate college students’ specific proportional reasoningdifficulties based on assessment items designed to span theproportional reasoning space [7]. The items are categorizedinto six subskills, which overlap with the early work of Arons,as ‘underpinnings’ to success in introductory physics [8]. Unlike physics experts, novices often have difficulty under-standing the many roles signed numbers can play in physicscontexts. Brahmia and Boudreaux constructed physics as-sessment items based on the natures of negativity from math-ematics education research [9] and administered them to in-troductory physics students [1]. They find that students havetrouble reasoning about signed quantity in several contextstypically found in the undergraduate curriculum (e.g., nega-tive work, or negative direction of electric field) [5, 10]. Ba-jracharya, Wemyss, and Thompson report that students strug-gle to make meaning of negative area under a curve in physicscontexts [11]. Hayes & Wittmann report students having dif-ficulty with the use of negative signs to attribute direction toacceleration in a functional representation [12]. All of thesestudies reveal that signed quantities, and their various mean-ings in introductory physics, present cognitive difficulties forstudents that many don’t reconcile before completing the in-troductory sequence.Because QL is ubiquitous throughout the undergraduatephysics curriculum, we expect that there may be interactioneffects between students’ mathematical reasoning skills andtheir abilities to apply them in multiple physics (and non-physics) contexts. As such, the protoPIQL includes multiplephysical contexts for each component of PQL.We administered the 18-question protoPIQL to students inthree different introductory physics courses at a large publicresearch university in the northwestern United States. Stu-dents completed the protoPIQL during the first week of classin each of the three quarter-long introductory physics courses:mechanics ( N = N = N = a r X i v : . [ phy s i c s . e d - ph ] J a n Score on PIQL F r a c t i on o f S t uden t s FIG. 1. Distribution of scores on the protoPIQL ( N = , both at the instrument as a whole, and specific items that yieldparticularly interesting results. II. WHOLE-TEST RESULTS
The protoPIQL consists of 18 multiple-choice questions: 8permit only a single response and 10 allow students to selectmultiple responses. For our first round of analyses, studentswere scored as being correct on a question if they selectedall correct responses and no incorrect responses. For all dataanalysis we include only students who answered at least 2/3of the questions (1,076 out of 1,104 respondants).Overall, scores are fairly normally distributed (see Fig. 1),with an average (mean, median, and mode) of 11 out of 18correct, a standard deviation of 3.0, and small but negativevalues of both skewness and kurtosis (-0.3 and -0.2, respec-tively). The internal reliability is Cronbach’s α = .
67, whichis below the commonly accepted thresholds of 0.8 for mak-ing measurements of individuals and 0.7 for making measure-ments of groups [13]; however, this may be due to the pro-toPIQL explicitly measuring two different constructs: pro-portional reasoning, and negativity [14].We use classical test theory (CTT) to evaluate the qualityof each question in terms of its difficulty and its discrimina-tion. In CTT, the difficulty is defined as the fraction of stu-dents who answer a particular question correctly. Difficultyranges from 0 to 1 with lower values indicating questions thatare harder for students. CTT discrimination is the differencebetween the difficulty of each question for high-scoring andlow-scoring students (upper vs. lower 27% with regard to testscore) [15]. Discrimination ranges from 0 to 1, with highervalues indicating a question that is answered differently forhigh-scoring and low-scoring students.Figure 2 shows the CTT difficulty and discrimination foreach question. The average CTT-difficulty is 0.62, and 13of the 18 questions fall within the generally accepted rangeof 0 . ≤ D ≤ .
8, with the remaining 5 being too easy [13].Of note in Fig. 2(a) is that most questions have difficulties
Question (a) C TT D i ff i c u l t y Question (b) C TT D i sc r i m i na t i on FIG. 2. Classical Test Theory results: (a) Difficulty, and (b) Dis-crimination for each question. Red lines show lower (and upper)thresholds for desired values. above 0.5, but there are four questions with difficulty val-ues near 0.25. The average CTT item discrimination betweenthe upper and lower 27% of the class is 0.41, and 15 of thequestions were above the common threshold of 0.3 with onemore being just below the threshold (0.296 for Q5) [13]. Ofnote in Fig. 2(b) is that no question achieves a discrimina-tion above 0.6, which is often considered to be highly dis-criminating. It should also be noted that the maximum pos-sible item discrimination in CTT is dependent on the itemdifficulty for D < .
27 or D > .
73 [13]; for example, themaximum discrimination for Q1 ( D = . . × . = . TABLE I. CTT results, separated by question type: single response(SR), multiple response with single correct answer (MRS), and mul-tiple response with multiple correct answers (MRM). Values indicatethe mean, and uncertainty is the standard error.SR MRS MRMDifficulty 0 . ± .
04 0 . ± .
08 0 . ± . . ± .
04 0 . ± .
04 0 . ± . siderations will include a balance in content so that one com-ponent of PQL is not featured more heavily than others, andexternal expert opinion that an item is appropriate and impor-tant for our target population.As mentioned previously, the protoPIQL contains single-response questions (SR) in which, students may only chooseone response) and multi-response questions (MR), in whichstudents may choose any combination of responses. The MRquestions can be further subdivided into those with only asingle correct response (MRS) and those with multiple cor-rect responses (MRM).[16] There are three MRM questions:9, 12, and 15. Comparing this list to the results in Fig. 2(a)shows a clear trend: all of the MRM questions are among themost difficult questions. Question 14 is the only MRS ques-tion with a difficulty level similar to the MRM questions; Q14asks students about the meaning of a negative component ofan electric field E x = −
10 N/C — a task that may be beyondthe abilities of students who have not yet taken a course inelectricity and magnetism (70% of the data set).Table I shows the average difficulty and discrimination forthese three groups of questions. These results indicate that theMRS questions are statistically similar to the SR questions interms of CTT difficulty (both with averages well above theideal mean of 0.5 [13]), and that the MRM questions are muchharder. This may be a result of students being much less likelyto correctly guess the answer to a MRM question given thatthey must choose the correct combination of responses. An-other factor here is that the correct responses to MRM ques-tions tend to involve different aspects of physical quantities.For example, on Q12 students must recognize that negativework indicates that the direction of a component of a force isopposite the direction of the displacement (response C) andthat negative work indicates that the energy of the system isdecreasing (response E). This may be more complex than rec-ognizing that a negative acceleration means that a componentof acceleration is in the negative direction (as on Q11, a MRSquestion), and previous research indicates that few studentsuse both vector reasoning and scalar reasoning when answer-ing these types of questions [1]. Given these large differencesin CTT difficulty values, we may be reaching (or exceeding)the limit on CTT’s usefulness, or we may have a situation inwhich CTT analyses are not appropriate. Interestingly, thereare no statistically significant differences between the CTTdiscrimination values for SR, MRS, and MRM questions.The big issue here seems to be that students need to cor-rectly choose multiple answers, and each may correspond
Question F r a c t i on o f R e s pon s e s Correctness
Only IncorrectBothSome CorrectAll Correct
FIG. 3. MRM question results with a 4-tiered correctness scale. with a different piece of knowledge. To examine thesequestions in more depth we have categorized student re-sponses using a multilevel correctness scale: selecting all cor-rect answers (All Correct), selecting at least one correct an-swer (Some Correct), selecting correct and incorrect answers(Both), or selecting exclusively incorrect answers (Only In-correct). Figure 3 shows the results on the MRM questionsusing these mutually exclusive levels. The fraction of AllCorrect responses to each question is consistent with Fig.2(a) at about 25% for each question, but the distribution ofpartially correct answers shows that at least 75% of studentsare choosing one of the correct responses, even if they alsochoose an incorrect response. Q9 has the largest fraction ofstudents (about 50%) choosing some (but not all) correct re-sponses, with Q12 and Q15 each having about 20% of stu-dents in this category. This is notable because Q9 is the onlyquestion with three correct responses and two incorrect re-sponses (one of which is a none-of-the-above option), andQ12 and Q15 each have two correct responses and three in-correct responses (with no none-of-the-above option). Theseresults provide evidence that analysis methods beyond thetraditional correct/incorrect dichotomy should be explored tofully represent students’ understanding of these topics.
III. IDENTIFYING TROUBLESOME QUESTIONS:CHARGE TRANSFER BETWEEN COMB AND HAIR
As mentioned above, the MRM questions yield some in-teresting results in terms of distributions of partially correctanswers. We have found that data from Q15 is particularlyinteresting with regard to who chooses which responses. Thetext of Q15 is shown in Fig. 4. Answering Q15 correctly re-quires students to know that the charges that are able to movefrom one object to another are negative electrons (responseA), and to recognize that the net charge has both a magnitudeand a sign (similar to a net force having a magnitude and adirection) and the “size” of the net charge depends only onthe magnitude of the charge: going from zero to nonzero in-dicates an increase in the magnitude (response E).
Valeria combs her hair, and as a result the net charge on thecomb goes from 0 to -5 C. Consider the following statementsabout this situation. Select the statement(s) that must be true . Choose all that apply. a. Negative charge was added to the comb. b. Charge was taken away from the comb.c. All of the electric charge in the comb is negative.d. The net charge on the comb is smaller after Valeriacombs her hair.e.
The net charge on the comb is larger after Valeriacombs her hair.
FIG. 4. Question 15: correct responses are bold.
Questions 14–16 all involve E&M topics: negative com-ponent of an electric field (Q14), negative charge (Q15), andnegative potential difference (Q16). We expected that stu-dents who had completed an E&M course would do better onthese questions. Our data indicate that this is true for Q14 andQ16, with statistically significant differences revealed by chi-square analyses ( p < . F r a c t i on o f R e s pon s e s Correctness
Only IncorrectBothSome CorrectAll Correct
Graduate Student Results
FIG. 5. Graduate student results using the 4-tiered correctness scale( N = signed quantity. A net charge has both magnitude and sign (ortype), so specifically asking about the magnitude would limitour abilities to measure how students interpret the negativesign. In an effort to get more input and try alternate word-ing (such as “amount of unbalanced charge”), we have beguninterviewing introductory students to determine the ways inwhich they interpret the various questions in the protoPIQL.This process is vital for the validation of the PIQL for mea-suring physics students’ PQL [14]. IV. SUMMARY AND FUTURE DIRECTIONS
The results from the protoPIQL are a promising start fordeveloping a valid and reliable assessment for measuring un-dergraduate physics students’ PQL. In addition to classicaltest theory, future analyses will include item response theory(both with dichotomously scored data and with the nominalresponse model), exploratory and confirmatory factor analy-sis (to ensure that the assessment reflects the desired balancein components of PQL), and additional interviews with bothundergraduate students and physics faculty to ensure that thePIQL measures what we want it to measure and that the itemsare appropriate and important for our target population. Ourgoal is to achieve good psychometric test parameters whileincluding both breadth and depth of content, and to createan assessment that students can complete in about 30 min-utes. We will also explore novel scoring techniques to valuegrowth in students’ reasoning skills, not just mastery of thetopic. Results from individual questions that appear some-what anomalous (like those presented from Q15) will helpidentify the frontiers of future research into physics students’quantitative literacy and reveal topics that are persistently dif-ficult for advanced undergraduate (and graduate) students.
ACKNOWLEDGMENTS
We are grateful to the instructors who administered theprotoPIQL questions in their classes, and to the students who participated in the research. This work was supported by NSFgrants DUE-1832880, DUE-1832836, and DUE-1833050. [1] Suzanne White Brahmia, “Negative quantities in mechan-ics: a fine-grained math and physics conceptual blend?” in
Physics Education Research Conference 2017 , PER Confer-ence (Cincinnati, OH, 2017) pp. 64–67.[2] Patrick W Thompson, “Quantitative reasoning and mathemati-cal modeling,” New perspectives and directions for collabora-tive research in mathematics education , 33 (2010).[3] Amy B. Ellis, “The influence of reasoning with emergent quan-tities on students’ generalizations,” Cognition and Instruction , 439–478 (2007).[4] Marcos D Caballero, Bethany R Wilcox, Leanne Doughty, andSteven J Pollock, “Unpacking students’ use of mathematics inupper-division physics: where do we go from here?” EuropeanJournal of Physics , 065004 (2015).[5] Suzanne Brahmia and Andrew Boudreaux, “Signed quantities:Mathematics based majors struggle to make meaning,” in Pro-ceedings of the 20th Annual Conference on Research in Under-graduate Mathematics Education , The Special Interest Groupof the Mathematical Association of Americ, edited by AaronWeinberg, Chris Rasmussen, Jeffrey Rabin, Megan Wawro,and Stacy Brown (San Diego, CA, 2017).[6] Jo Boaler,