Cognitive Reflection Test and the Polarizing Force-Identification Questions in the FCI
CCognitive Reflection Test and the PolarizingForce-Identification Questions in the FCI
Allan L. Alinea
Institute of Mathematical Sciences and Physics,University of the Philippines Los Ba˜nos,College, Los Ba˜nos, Laguna 4031 Philippines, [email protected]
Abstract
The set of polarizing force-identification (PFI) questions in the FCI consists of six itemsall basically asking only one question: the set of forces acting on a given body. Although itmay sound trivial, these questions are among the most challenging in the FCI. In this workinvolving 163 students, we investigate the correlation between student performance on theset of PFI questions and the Cognitive Reflection Test. We find that for both scores in theFCI as a whole and in the PFI questions, the range of values of the Pearson coefficient at 95%confidence interval, is suggestive that cognitive reflection may be one of the contributingfactors in the student performance in the FCI. This is consistent with the idea that highlevel of cognitive reflection may help in eliminating seemingly valid choices (misconceptions)in the FCI that are intuitive from everyday experience or “common sense” but otherwisemisleading. The ability to activate System 2 in Dual Process Theory, whether from System1 or right after reading a physics problem, may contribute in narrowing down the set ofprospective valid answers in a given physics problem. Complementary to cognitive reflectionare other factors associated with deep understanding of physics whose effects are expectedto become more evident with the level of difficulty of a set of physics problems. Given twostudents with the same level of cognitive reflection, the one with deeper understanding ofphysics is more likely to get the correct answer. In our analysis, the range of correlationcoefficient for the set of PFI questions is downshifted with respect to that for the FCI as awhole. This may be attributed to the more challenging nature of the latter compared to asignificant fraction of the remaining questions in the former. keywords:
Polarizing Force-Identification Questions; Cognitive Reflection Test; ForceConcept Inventory; Dual Process Theory; Physics Educationaccepted for publication: European Journal of Physics
In spite of the enormity and complexity of possibleinformation configurations that it can process, it ap-pears that the way human mind thinks or decides ona given problem or situation, can often be simplifiedinto merely two systems. Dual process theory (DPT)in psychology distinguishes these two systems as intu-itive (‘System 1’), one that is considered fast and au-tonomous, and analytic (‘System 2’), the other one thatis considered as deliberative and slow; see e.g., Refs.[1, 2, 3]. System 1 allows us to make quick decisionswith minimal strain on our mental resources (e.g., rec-ognizing a friend in a classroom). On the other hand,System 2 enables us to solve problems where higher- order thinking is required (e.g., solving a 10 × su-doku ).In dealing with different situations, it is helpfulto efficiently choose the appropriate process to use.More properly, with System 1 often proceeding “uncon-sciously,” it is important to be cognizant of the need toactivate System 2 on top of System 1. The CognitiveReflection Test (CRT) [4] is a widely used measure forthe propensity to override System 1 by System 2. Itis composed of three-item set of questions designed toinitially draw the problem solver into activating Sys-tem 1. Considering the bat-and-ball problem in theCRT, for instance, using System 1, the tendency is toanswer 10 cents. A quick reflection however, indicatesthat this cannot be correct; for then, the total cost1 a r X i v : . [ phy s i c s . e d - ph ] O c t ould be $0 .
10 + $1 .
10 = $1 .
20! The correct answerusing short algebraic manipulation or trial-and-error is5 cents.Scores in the CRT are found to have moderate cor-relation with performance on Wonderlic Personnel Test[4], heuristics-and-biases tasks [5], incidences of con-junction fallacy [6, 7], and susceptibility to some be-havioral biases [8], amongst others. From a simplifiedperspective, these studies suggest that people who tendto think deeply, as may be measured by CRT, are ableto make better decisions and solutions to problems.In the field of Physics Education, to which this paperbelongs, such a perspective although simplified, offersan attractive avenue to look into the possible correla-tion between CRT and student performance on somestandardized tests. The idea is that DPT, as it relatesto CRT, may be able to explain the way students ap-proach and solve problems in Physics. This in turn,may give us an insight about the Physics learning pro-cess and as educators, the ways by which we may beable to improve it.Following similar line of thinking, Wood, Galloway,and Hardy [9] (see also Refs. [10, 11]) investigated thequestion “Can dual processing theory explain physicsstudents’ performance on the Force Concept Inventory?[FCI]” They examined the relationship between stu-dent performance on CRT and FCI [12, 13] and found a moderately positive linear correlation between thetwo for both pretest and post test. The “findings indi-cate that students who are more likely to override thesystem 1 intuitive response and to engage in the moredemanding cognitive reflection needed to answer theCRT question correctly are also more likely to scorehighly on the FCI, implying that similar cognitive pro-cesses account for at least some of the cognitive abilitiesneeded for each test.” [9]This study intends to further probe the possible re-lationship between CRT, DPT, and the FCI. In partic-ular, while not necessarily neglecting FCI as a whole,we wish to investigate student performance in the set ofsix Polarizing Force-Identification (PFI) questions (seeRefs. [14, 15]) in the FCI, namely, questions 5, 11, 13,18, 29 and 30, with their scores in the CRT. This sub-set of the FCI effectively asks only one basic question:identification of the force(s) acting on a given body.Surprisingly, out of the five choices for each question,the majority of the students tend to only select twochoices (polarizing choices). One of these choices is thecorrect answer containing the right set of forces actingon a given body while the other one contains the rightset of forces plus at least one erroneous force; see
Fig. . The inclusion of this erroneousforce confuses students causing the “polarization” ofstudent response.Figure 1: In this sample question, one is asked to identify the force(s) acting on the sphere. Whereas moststudents may be able to identify forces i and ii correctly, the addition of erroneous force iii may confused them,causing “polarization” of student answers between choices (D) and (E).
The subjects of our study were 163 students under en-gineering, chemistry, and physics degree programs of-fered by the University of the Philippines Los Ba˜nos.All of these students have already passed the prescribedcalculus-based mechanics course and were taking otherfundamental physics course under the author of thiswork, upon sitting for the FCI and CRT. The two testswere administered as pen-and-paper test and studentswere given incentives to take the tests seriously. Datagathering took place at the University of the Philip-pines Los Ba˜nos from Academic Years 2017-18 to 2018-2019.To minimize possible bias with the CRT scores, weopted to remove results for students who did not takethe test for the first time. The remaining 163 testresults for the FCI and CRT were subjected to itemanalysis with the help of ZipGrade [19] and a spread- sheet application. Common statistical parameters suchas mean, standard error, and Pearson coefficients werecalculated to find possible meaningful relationships be-tween scores in FCI as a whole, set of PFI questions,and CRT. Table 1 shows the distribution of students with respectto CRT scores ranging from zero out of three (all wronganswers) to three out of three (perfect score). Abouthalf (47%) of the students got a perfect score in thetest while only one tenth (10%) got the lowest score.With 43% scoring one or two, the skewed frequencydistribution drives the average to 2.1/3 (closed to thatof Ref. [9], 2.3, with similar cohorts). This is aboutone and a half times higher than that in Ref. [18] cov-ering more than 40,000 students. The difference maybe due to stronger inclination of our subjects to mathe-matics as suggested by their chosen degree programs inengineering, chemistry, and physics, compared to thepopulation with more varied interests investigated inRef. [18]. In Ref. [20], the authors found a moder-ately positive correlation between cognitive reflectionand numeric ability.Majority of the students who got the correct an-swers perform some sort of short mathematical calcula-tions on paper. This constitute the majority of our testsubjects based on Table 2. For the bat-and-ball prob-lem, for instance, majority of the nearly 80% who gotthe correct answer, approached the problem by solvingsome form of algebraic equation(s); some did try trial-and-error to fit the conditions of the problem. For thelilypad problem, geometric sequence either from day1 to day 48 or backwards from day 48, can be foundin the student solutions with correct answers. For themachine problem, short solutions involving the use ofratio and proportion can be found on student papers.Admittedly, few students who got the wrong answersmade some marks on paper in addition to the final an-swer. However, the “seriousness” of these scribbles arenot at par with those of students who got the correctanswer(s). In the lilypad problem for instance, one canfind ‘48/2’ yielding 24 days—the intuitive but wronganswer. visibility on the performance in CRT RT Score 0/3 1/3 2/3 3/3 MeanPercent Students (No. of Students) 10% (16) 22% (36) 21% (34) 47% (77) 2.1
Table 1: Percent (and number) of students who got CRT scores from 0/3 (all wrong) to 3/3 (perfect score).
CRT Problem bat and ball machine lilypadPercent Students (No. of Students) 79% (128) 59% (96) 68% (111)
Table 2: Percent (and number) of students who answered each CRT [4] question correctly. Problems 1, 2, and 3are labeled bat and ball , machine , and lilypad , respectively.It is worth noting that students were not requiredto present a mathematical solution nor any other formof solution to the CRT; only the final answers were re-quired in the closely supervised test. This has beenmade clear in the instructions explained in class beforestudents could start the tests. Any mathematical solu-tion (be it serious or simply light scribbles) in the formof algebraic manipulation, ratio and proportion, or ge-ometric sequence, is out of their own volition. We findthat the majority of students who got the wrong answerin any of the CRT problems simply gave the wrong in-tuitive answers (i.e., 10 cents, 100 minutes, and 24 daysfor the bat-and-ball, machine, and lilypad problems re-spectively) without any supplemental solution at all, ortried to perform some light mathematical scribbles onpaper only to end up with or confirm the same wrongintuitive answers; e.g., writing 1:1:1 for the machineproblem then yielding the intuitive but wrong answerof 100 minutes. The intuitive answers account for 86%,69%, and 54% of all the wrong answers in the bat-and-ball, machine, and lilypad problems respectively; thisis the majority of wrong answers as in Ref. [9]. Onthe other hand, among the correct student responsesare clear cases with erasure marks covering the wrongintuitive answers right beside the correct one. Thisindicates the transition from System 1 to System 2,consistent with the aim of CRT.Having said this, there is a possibility that onecould have actually activated System 2, as may be ev-ident in their scratch works, for a given CRT question,without finding the correct answer nor the intuitivebut wrong answer. This may then weigh against theaccuracy of CRT score as a measure of the tendencyto activate System 2. Assuming that students who gotthe correct answers in the CRT activated System 2 andthose who answered the intuitive but wrong answers(10 cents, 100 minutes, 24 hours) activated only Sys-tem 1, this could possibly account for a maximum of 14%, 31%, and 46% of all the wrong answers in the bat-and-ball, machine, and lilypad problems, respectively.However, consistent with the immediately precedingtwo paragraphs, we find this group of students to besomewhat of a polar opposite to the group of studentswho got the correct answer(s)—most students who gotthe wrong but non-intuitive answers did not write anyform of solution at all. Some scribbled one-liner nu-meric calculation (e.g., ‘100/5’ for the machine problemand ‘48/4’ for the lilypad problem) which may hardlybe considered as good evidence of activating System 2.For the three CRT problems we find only three cases(two for the bat-and-ball problem, one for the machineproblem, and zero for the lilypad problem) with sometractable short algebraic calculation corresponding towrong and non-intuitive answers. This gives us confi-dence to hold on to the raw CRT score as our measureof the level of cognitive reflection at least as far thescope of this study is concerned.After having discussed the CRT results, let us nowlook into the student performance in the FCI with re-spect to CRT. Figure 2 shows the distribution of stu-dent scores for the two tests. For CRT scores rangingfrom 1 to 3, the FCI mean score exhibits an upwardtrend suggestive of a good linear correlation with theformer for the mentioned range. However, the FCImean score for the CRT score of zero, in addition to theerrors associated with the mean, somewhat smears thisprospect for a possible good positive linear correlation.The Pearson coefficient for the CRT-FCI scores turnsout to be 0.24 in the approximate range [0.09, 0.38] (computed through the Fisher transformation) at 95%confidence interval. Assuming the same normality con-dition, this is consistent with the results in Ref. [9] forthe pretest, r = 0 .
38 in the approximate range [0.23,0.51] and post test, r = 0 .
32 in the approximate range[0.17, 0.46] , at the same confidence interval; see Ref.[21] about comparing correlation coefficients. r as narrow as possible. We approximated that given the same base value of the correlationcoefficient, we may need a sample size of (cid:38)
4e provide a two-fold explanation for this finding.Firstly, if CRT is a good measure of the tendency toshift from System 1 to System 2, within the contextof DPT, it means that we cannot rule out the possi-bility that one of the contributing factors in gaining ahigh score in the FCI is the high level of cognitive re-flection. The questions in the FCI each have four outof five incorrect answers in the choices, many of whichinvoke student misconception or misleading preconcep-tion [22, 23] in fundamental classical mechanics. Manyof these misconceptions or preconceptions in turn, arefrom “common-sense” everyday experiences that con-stitute their intuition. For instance, students may havea predisposition to decide right away from “commonsense” that heavier objects tend to fall faster than alighter object of the same size and shape. A low levelof cognitive reflection can drive a student to select in-tuitive answers that are wrong. This may then lead, asone of the contributing factors, to low FCI score.Secondly, while a shift from System 1 to System 2 may contribute in the more “serious” drive to findthe correct answer in one FCI question, this may notbe enough. After System 2 is activated, some incor-rect choices may be discarded after a short but rela-tively deeper reflection compared to System 1. How-ever, there could remain further hurdles—the other in-correct answers—before a student could pinpoint thecorrect answer. This means that even if all studentsstart out with activating System 2 without necessarilypassing System 1 (similar to that when encountering amath problem to find √ . Distribution of FCI scores with respect to CRT scores. The error bars are set one standard error of themean above and below the mean.
We identified and elaborated the six PFI questions inthe FCI in our earlier study presented in Ref. [14].Back then our sample size was only about 50 (inter-national) students. And although relatively small, thepattern for the PFI questions was sharp enough wor-thy of publication. Figure 3 confirms the existence ofthese PFI questions, now with a much larger sample size of 163 students—triple that of the former study. Ascan be seen in the figure, majority of the students, forthe six force identification questions, effectively answeronly two (polarizing choices) out of the five choices.Except possibly for the identified PFI question 3, wherethe number of students who chose letter B is similar to that for letter D (the correct answer), the six bargraphs are consistent with the result in Ref. [14]. Distribution of scores for the six PFI questions (Questions 5, 11, 13, 18, 29 and 30 in the FCI) withrespect to CRT scores. Of the five choices, with X meaning no answer, majority of the students chose the twopolarizing choices—one that contains the correct answer and the other one that is effectively a superset of thecorrect answer.
With the PFI questions at hand, let us look into itspossible relationship with the CRT. Figure 4 shows thedistribution of student scores for the set of PFI ques-tions with respect to the CRT score. The graph shownis effectively a subset of that shown in Fig. 2 involv-ing all the questions in the FCI. For the PFI questionswith respect to CRT, the error bars are wider. Thereseems to be a rising trend between PFI mean scoreand CRT score from CRT score of 0 up to 2. However,the downshift at CRT score equal to 3 seems to havespoiled this possible trend. All in all, we get a correla-tion coefficient of r = 0 .
096 in the approximate range[-0.06, 0.25] at 95% confidence interval.The base value of the Pearson coefficient is smallbut its range tells us that we cannot simply set asidecognitive reflection in view of student performance in the PFI questions. The identification of forces actingon a given body, as simple as it may sound, is one ofthe most basic skills necessary in the study of dynam-ics. Yet even the acquisition of this very basic skillis plagued by misconceptions or misleading preconcep-tions directly or indirectly from everyday experiencesthat form our intuition. Instances of these conceptsinclude the requirement for a force to sustain the mo-tion of a body (in a vacuum) and the existence of cen-trifugal force (in an inertial frame). When confrontedwith questions asking for a set of forces acting on agiven body, low level of cognitive reflection may leadto the inclusion of intuitive but misleading forces. Deepthinkers on the other hand, may rule out these wrongforces, effectively narrowing down the set of prospec-tive correct answers.6igure 4:
Distribution of scores for the set of PFI questions in the FCI with respect to CRT scores. The errorbars are set one standard error of the mean above and below the mean.
Having said this, our range of the Pearson coeffi-cient is evidently on the lower half of its absolute spec-trum from 0 to 1. Similar to that for the FCI as awhole, we contend that on top of the ability to shiftfrom System 1 to System 2, is the need for deep un-derstanding of Physics concepts or ideas to pinpointthe right set of forces acting on a given body. In otherwords, we see that high level of cognitive reflection issome sort of initial “push” needed to submerge one intoa sea of thought, and complemented by will, naturaltalent, and/or industry, may lead them to acquire theright understanding or decision in solving physics prob-lems be it as basic as force identification or as complexflying a real rocket.Before we leave this sub-section, we take cognizanceof the downwshifted range for the Pearson coefficientfor PFI score vs. CRT score, with respect to FCI score(as a whole) vs. CRT score; that is, from [0.09, 0.38] to[-0.06, 0.25]. Although the intervals are still overlap-ping, we take our freedom to account for the possible difference (hopefully to be resolved in future studies).Figure 5 shows the distribution of the number of stu-dents who got the correct answer for each FCI ques-tion. Based on the figure, our students found FCI itemnumbers 5, 11, 13, 14, 17, 18, 21, 25, 26, and 30, tobe the top 10 most challenging FCI questions. Of thesix-item set of PFI questions, five belongs to these top10 questions. We may see then that on average, thesubjects of our study found the set of PFI questions tobe more challenging compared to the rest of the FCIquestions and this possibly caused the downward shiftin the correlation coefficient. We are inclined to thinkthat cognitive reflection stands as an important con-tributing factor in the student performance in the setof PFI questions and in the FCI as a whole. When stu-dents activate System 2, this is where the other factorsassociated with deep understanding of physics comeinto play. The gravity of these other factors becomemore evident with the level of difficulty faced by stu-dents in answering a given set of physics problems.
The operation of the human mind is still one of mostcomplex processes far from our complete comprehen-sion. From the perspective of a Physics Educator, thereis a need to understand it so as to optimize studentlearning experience. But as the goal post of completeunderstanding is still far beyond the horizon, we aredelighted of large strides leading to this end. Dual Pro-cess Theory may be seen as one these strides telling usof a significant simplification employing two systemsof thought processes: one that is intuitive (System 1)and the other one being analytic (System 2). The Cog- nitive Reflection Test is a good measuring instrumentof cognitive reflection indicating the tendency to shiftfrom System 1 to System 2. The use of CRT offers anattractive avenue to look into the relationship betweencognitive reflection and student performance in Physicstests such as the FCI.In this study we look into the idea that studentswho can easily transition from System 1 to System 2or start right away with System 2 in DPT, may beable to perform better in the set of polarizing force-identification questions in the FCI and in the FCI as awhole. The result of our analysis of tests involving 163students is suggestive that cognitive reflection is oneof the contributing factors involved in student perfor-7igure 5:
Distribution of the number of students who got the correct answer for each question in the FCI. Thetop 10 (out of 30) most challenging problems based on student performance are identified in graph as the ‘lowest10’. Of these 10 questions, five belong to the set of PFI questions. mance in the FCI and the set of PFI questions. Ourinsight is that high level of cognitive reflection may en-able students to cross out seemingly correct choices inthe test that are normally part of our intuition fromeveryday experience or “common sense” but otherwisemisleading. Complementary to cognitive reflection areother factors associated with deep understanding inphysics. These become more evident with the levelof difficulty of a given set of physics problems. We findthat the range of correlation coefficient between CRTscore and PFI score is downshifted with respect to thatfor CRT score and FCI score. The possible differencemay be attributable to the set of PFI questions beingmore challenging on average compared the rest of FCIquestions.Looking ahead, with all its efforts, this study hasonly covered a small (but significant) portion of stu-dent learning and performance in physics in relationto cognitive reflection and DPT. From here, we fore-see future studies involving further tests hopefully withlarger number of participants. Other Physics invento-ries or tests related to scientific reasoning ability (seeRefs. [24, 25]) may be explored to find correlations withthe level of cognitive reflection. Considering CRT, ahigher resolution test (see the proposed expanded CRTin Ref. [26]) in tandem with an expanded PFI ques-tionnaire, may be used to better resolve differences inthe level of cognitive reflection. Regarding the admin-istration of the CRT, the use of “mild subterfuge” as inRef. [9] may be employed and studied to see its effect on the correlation between cognitive reflection and FCIor any other standardized Physics Test.
References [1] Neys, Wim De, editor. Dual Process Theory 2.0.Routledge, an Imprint of the Taylor & FrancisGroup, 2018.[2] Stanovich, Keith E, and Richard F West. Individ-ual Differences in Reasoning: Implications for theRationality Debate? Cambridge University Press,2001.[3] Kahneman, Daniel. Thinking, Fast and Slow.Nota, 2013.[4] Frederick, Shane. Cognitive Reflection and Deci-sion Making.
Journal of Economic Perspectives , (4), 25–42, 2005.[5] Toplak, Maggie E., et al. The Cognitive ReflectionTest as a Predictor of Performance on Heuristics-and-Biases Tasks. Memory & Cognition , (7),1275–1289, 2011.[6] Oechssler, J¨org, et al. Cognitive Abilities and Be-havioral Biases.” Journal of Economic Behavior &Organization , (1), 147–152, 2009.[7] Liberali, Jordana M., et al. Individual Differencesin Numeracy and Cognitive Reflection, with Im-plications for Biases and Fallacies in Probability8udgment. Journal of Behavioral Decision Mak-ing , (4), 361–381, 2011.[8] Hoppe, Eva I., and David J. Kusterer. BehavioralBiases and Cognitive Reflection. SSRN ElectronicJournal, 2009.[9] Wood, Anna K., et al. Can Dual Processing The-ory Explain Physics Students’ Performance onthe Force Concept Inventory? Physical ReviewPhysics Education Research , (2) 2016.[10] M. Kryjevskaia, M. R. Stetzer, and N. Grosz, An-swer first: Applying the heuristic-analytic theoryof reasoning to examine student intuitive think-ing in the context of physics, Phys. Rev. ST Phys.Educ. Res . 10, 020109, 2014.[11] C. R. Gette, et al. Probing Student Reasoning Ap-proaches through the Lens of Dual-Process Theo-ries: A Case Study in Buoyancy.
Physical ReviewPhysics Education Research , (1), 2018.[12] D. Hestenes, M. Wells, and G. Swackhamer, ForceConcept Inventory, Phys. Teach. (3), 141-158,1992[13] A. Savinainen and P. Scott, The Force Concept In-ventory : a tool for monitoring student learning, Phys. Educ. Physics Education , (2), 210–217, 2015.[15] A. L. Alinea, and W. Naylor. Gender Gap and Po-larisation of Physics on Global Courses. PhysicsEducation , (2), 2017.[16] R. R. Hake, Interactive-engagement versus tra-ditional methods: a six-thousand-student surveyof mechanics test data for introductory physicscourses, Am. J. Phys. Be-havior Research Methods . (5), 1953–1959, 2017. [18] Bra˜nas-Garza, Pablo, et al. Cognitive ReflectionTest: Whom, How, When. Journal of Behavioraland Experimental Economics , Frontiers in Psy-chology , , 2015.[21] K. L. Wuensch, (2019) Comparing Corre-lation Coefficients, Slopes, and Intercepts,http://core.ecu.edu/psyc/wuenschk/docs30/CompareCorrCoeff.pdf,retrieved Mar 15, 2020.[22] I. A. Halloun and D. Hestenes. Common SenseConcepts about Motion. American Journal ofPhysics , (11), 1056–1065, 1985.[23] M. Finegold and P. Gorsky. Learning aboutForces: Simulating the Outcomes of PupilsMisconceptions. Instructional Science , (3),251–261, 1988.[24] V. P. Coletta and J. A. Phillips, Interpreting FCIscores: Normalized gain, preinstruction scores,and scientific reasoning ability, American Journalof Physics , 1172, 2005.[25] S. Ates, and E. Cataloglu. The Effects of StudentsReasoning Abilities on Conceptual Understand-ings and Problem-Solving Skills in IntroductoryMechanics. European Journal of Physics (6),1161–71, 2007.[26] M. E. Toplak, R. F. West, and K. E. Stanovich.Assessing miserly information processing: An ex-pansion of the Cognitive Reflection Test. Thinking& Reasoning ,20