[PDF] Exploring the Structure of Misconceptions in the Force and Motion Conceptual Evaluation with Modified Module Analysis

Abstract

Investigating student learning and understanding of conceptual physics is a primary research area within Physics Education Research (PER). Multiple quantitative methods have been employed to analyze commonly used mechanics conceptual inventories: the Force Concept Inventory (FCI) and the Force and Motion Conceptual Evaluation (FMCE). Recently, researchers have applied network analytic techniques to explore the structure of the incorrect responses to the FCI identifying communities of incorrect responses which could be mapped on to common misconceptions. In this study, the method used to analyze the FCI, Modified Module Analysis (MMA), was applied to a large sample of FMCE pretest and post-test responses ( N pre =3956 , N post =3719 ). The communities of incorrect responses identified were consistent with the item groups described in previous works. As in the work with the FCI, the network was simplified by only retaining nodes selected by a substantial number of students. Retaining nodes selected by 20\% of the students produced communities associated with only four misconceptions. The incorrect response communities identified for men and women were substantially different, as was the change in these communities from pretest to post-test. The 20% threshold was far more restrictive than the 4% threshold applied to the FCI in the prior work which generated similar structures. Retaining nodes selected by 5% or 10% of students generated a large number of complex communities. The communities identified at the 10\% threshold were generally associated with common misconceptions producing a far richer set of incorrect communities than the FCI; this may indicate that the FMCE is a superior instrument for characterizing the breadth of student misconceptions about Newtonian mechanics.

Full PDF

aa r X i v : . [ phy s i c s . e d - ph ] J a n Exploring the Structure of Misconceptions in the Force and Motion ConceptualEvaluation with Modiﬁed Module Analysis

James Wells, Rachel Henderson, Adrienne Traxler, Paul Miller, and John Stewart ∗ College of the Sequoias, Science Division, Visalia CA, 93277 Michigan State University, Department of Physics and Astronomy, East Lansing MI, 48824 Wright State University, Department of Physics, Dayton OH, 45435 West Virginia University, Department of Physics and Astronomy, Morgantown WV, 26506 (Dated: January 7, 2020)Investigating student learning and understanding of conceptual physics is a primary research areawithin Physics Education Research (PER). Multiple quantitative methods have been employed toanalyze commonly used mechanics conceptual inventories: the Force Concept Inventory (FCI) andthe Force and Motion Conceptual Evaluation (FMCE). Recently, researchers have applied networkanalytic techniques to explore the structure of the incorrect responses to the FCI identifying commu-nities of incorrect responses which could be mapped on to common misconceptions. In this study,the method used to analyze the FCI, Modiﬁed Module Analysis (MMA), was applied to a largesample of FMCE pretest and post-test responses ( N pre = 3956, N post = 3719). The communities ofincorrect responses identiﬁed were consistent with the item groups described in previous works. Asin the work with the FCI, the network was simpliﬁed by only retaining nodes selected by a substan-tial number of students. Retaining nodes selected by 20% of the students produced communitiesassociated with only four misconceptions. The incorrect response communities identiﬁed for menand women were substantially diﬀerent, as was the change in these communities from pretest topost-test. The 20% threshold was far more restrictive than the 4% threshold applied to the FCIin the prior work which generated similar structures. Retaining nodes selected by 5% or 10% ofstudents generated a large number of complex communities. The communities identiﬁed at the10% threshold were generally associated with common misconceptions producing a far richer set ofincorrect communities than the FCI; this may indicate that the FMCE is a superior instrument forcharacterizing the breadth of student misconceptions about Newtonian mechanics. I. INTRODUCTION

Understanding common diﬃculties students exhibit inlearning conceptual physics has been an important re-search strand in physics education research (PER) sinceits inception. This work was greatly advanced by theintroduction of multiple-choice conceptual instrumentsmeasuring students’ understanding of mechanics andelectricity and magnetism: the Force Concept Inventory(FCI) [1], the Force and Motion Conceptual Evaluation(FMCE) [2], the Conceptual Survey of Electricity andMagnetism (CSEM) [3], and the Brief Electricity andMagnetism Assessment (BEMA) [4]. Studies involvingthese instruments continue to be of central importancein PER. For an overview of the history of these instru-ments and their use in PER, see Docktor and Mestre’sextensive synthasis of the ﬁeld [5].Recently, substantial eﬀorts have been made to applyquantitative techniques to further understand these in-struments including factor analysis [6–8], cluster analy-sis [9], and item response theory [10–13]. In 2016, Brewe,Bruun, and Bearden [14] introduced a new class of quan-titative algorithms to analyze the incorrect answers, net-work analytic methods [15, 16]. Network analysis is abroad, ﬂexible, and extremely productive ﬁeld of quanti-tative analysis that has been used to analyze systems as ∗ [email protected] diverse as the functional networks in the brain [17] andpassing patterns of soccer teams [18].A network is formed of nodes which are connected byedges. Network analysis seeks to identify structure withinthe network; one important class of structure is subsets ofthe network which are more interconnected within them-selves than they are connected to the rest of the network.These subsets are called “modules” or “communities” in-terchangeably. In anticipation of the “igraph” package[19] in the “R” software system [20] becoming the pri-mary tool used within PER for network analysis, we willcall the subgroups “communities.”Wells et al. [21] attempted to replicate Brewe’s et al. [14] analysis for the FCI and found that it did not scaleto large datasets. They suggested a modiﬁed algorithmcalled Modiﬁed Module Analysis (MMA); the details arediscussed below as Study 1. In the current study, theMMA algorithm was applied to explore the communitystructure of the FMCE; the results are then compared tothe results of Study 1.This study sought to answer the following researchquestions: RQ1:

What incorrect answer communities are identiﬁedby Modiﬁed Module Analysis in the FMCE?

RQ2:

How are these communities diﬀerent pre- andpost-instruction? How is the community structurediﬀerent for men and women?

RQ3:

How do the communities change as the parametersof the MMA algorithm are modiﬁed?

RQ4:

How do the communities detected compare tothose detected in the FCI in Study 1?

A. The FMCE Instrument

The FMCE is a widely used mechanics conceptual in-ventory that measures students’ understanding of forceand motion. The instrument consists of 43 items exam-ining student understanding of Newton’s laws of motion.The items are presented in groups with each item hav-ing at least 6 possible responses, some of which representcommon misconceptions. Most items include a “none ofthe above” response which is not the correct responseto any item; “none of the above” responses have beenshown to cause psychometric problems [22]. The FMCEis available at PhysPort [23].The FMCE uses the practice of “blocking” or “chain-ing” items where multiple items refer to a common stem.In an item block, a physical system is introduced, thenmultiple items refer to that system. Of the 43 items inthe FMCE, all but one (item 39) are included in itemblocks. The FCI also employs item blocks with 13 ofthe 30 items included in blocks. Multiple studies havesuggested that blocking items introduces spurious corre-lations that can make the instrument diﬃcult to interpretstatistically [12, 21].Since its introduction, the blocked structure of theFMCE has been used to provide a compact descriptionof the instrument in terms of the qualitative features ofthe item blocks. This description has been reﬁned sincethe introduction of the instrument as will be discussed inSec. II A. The descriptive terms provide an overview ofthe instrument. “Force Sled” items (items 1-7) ask aboutthe force that an individual would need to exert on a sledon a low-friction surface to produce a set of accelerations;students select for a number of textual responses. “Carton a Ramp” items (items 8-10) ask students to select theforce on a cart as it moves up and down an incline. “CoinToss - Force” items (items 11-13) ask students to selectthe force on a coin tossed in the air. “Force Graph” items(items 14-21) ask students about the force on a toy caras it moves across a low-friction surface; students selectfrom a number of graphs. “Acceleration Graph” items(items 22-26) ask students to select the graph which cor-rectly represents the acceleration of a toy car moving ona horizontal surface. “Coin Toss - Acceleration” items(items 27-29) ask students to select the acceleration of acoin tossed in the air. “Newton III” items (items 30-39)ask students about the forces during a variety of interac-tions between cars and trucks. “Velocity Graph” items(items 40-43) ask students to select the graph which cor-rectly represents the velocity of a toy car moving on ahorizontal surface. The current version of the FMCEhas four multiple choice “Energy” items (items 44-47)and one free response item (46a). These items were not present in the original FMCE and will not be analyzedin this study.

B. Prior Studies

As this analysis was motivated by prior works, this re-search will draw heavily from two previous studies whichwill be referenced as Study 1 and Study 2 throughout themanuscript.

1. Study 1: Modiﬁed Module Analysis

In Study 1, Wells et al. [21] introduced Modiﬁed Mod-ule Analysis (MMA), a network analytic method to ex-plore the structure of the incorrect answers of a multiple-choice instrument. Modiﬁed Module Analysis was intro-duced to the adapt Module Analysis of Multiple-ChoiceResponses (MAMCR) method of Brewe et al. [14] fora large datasets. In both MMA and MAMCR, the in-correct responses to a conceptual inventory are used todeﬁne a network with weighted edges. The responsesare the nodes of the network. In MAMCR, the numberof times two responses are selected by the same studentdeﬁne the edge weight of the network. For example, ifFCI response 1D and 2B were selected together by 40students, the network would contain 1D and 2B as nodesand have an edge between the nodes with weight 40. Thenotation 1D represents response “D” to item 1. In MMA,the edge weight is the correlation coeﬃcient between thetwo responses.To analyze this network, the correlation matrix wascalculated and a threshold applied. In Study 1, onlyedges which were correlated at the r > . r is the correlation coeﬃcient. The remain-ing correlated items deﬁne a network with edge weightequal to the correlation. A community detection algo-rithm was then applied to detect substructure in the net-work. A community represents a set of nodes that arepreferentially selected together by many students. TheMMA algorithm detects incorrect answer communities,subsets of the network formed of incorrect answers whichare preferentially selected together. Modiﬁed ModuleAnalysis identiﬁed 9 pretest communities and 11 post-test communities on the FCI. Three of the communitieswere the result of blocked items. For these blocked items,the later response was the correct response if an earlierresponse had been correct. In most cases, the remain-ing communities could be related to the misconceptionsassociated with the items in original paper introducingthe FCI [1] and in the more detailed taxonomy providedby Hestenes and Jackson [24]. For eight of the communi-ties, a dominant misconception was identiﬁed and for twoof the communities, two common misconceptions wereidentiﬁed. For example, one FCI community includedresponses { } , common incorrect answers tothe Newton’s 3rd law items. Students were applying boththe greater mass implies greater force and the most ac-tive agent produces greater force misconceptions for theseitems.Study 1 found the communities identiﬁed for men andwomen on both the pretest and post-test, while not iden-tical, were very similar.

2. Study 2: Multidimensional Item Response Theory andthe FMCE

Study 1 made extensive use of a prior study of the FCIapplying constrained Multidimensional Item ResponseTheory (MIRT) to produce a detailed model of the phys-ical reasoning required to correctly solve the items in theinstrument [12]. The incorrect communities not relatedto the blocking of items often required similar physicalreasoning for their solution. This methodology has re-cently been extended to the FMCE and will be referencedas Study 2. In Study 2, Yang et al. performed a detailedanalysis of the correct answers to the FMCE using con-strained MIRT [25]. This technique produced a detailedmodel of the instrument in terms of the fundamental rea-soning steps (principles) required for its solution. Resultsof factor analysis and correlation analysis were also pre-sented. All analyses suggested the existence of subsets ofitems within the instrument that shared a common solu-tion structure. These item groups included items 40-43(deﬁnition of velocity), 22-26 (deﬁnition of acceleration),30-39 (Newton’s 3rd law), and 8-13 and 27-29 (motionunder gravity). A ﬁfth group of items, items 1-7 and14-20, measured a combination of Newton’s 1st and 2ndlaw and corollaries of motion derived from these laws.These item groups presented responses to students usingdiﬀerent representations with items 1-7 asking studentsto select textual responses and items 14-20 asking stu-dents to choose between two-dimensional graphs. Theconstrained MIRT analysis found that this distinctionbetween textual and graphical responses was importantto understanding student answers to the instrument.The groups identiﬁed as requiring a common solutionstructure are well aligned with the item groups identi-ﬁed by previous research and described in Sec. I A sup-porting the identiﬁcation of these groups as measuringdistinct elements of Newtonian thinking. Some of thegroups suggested by MIRT combine groups suggested byprevious authors. For example, “Cart on a Ramp,” “CoinToss - Force,” and “Coin Toss - Acceleration” items allrequire an understanding of the force or acceleration dueto gravity for their solution. Item groups with similarcorrect solution structure will often also have responsesthat represent consistently applied misconceptions in theanalysis which follows.In general, the FMCE had many more items requir-ing similar reasoning for their solution than the FCI;this may make it a productive instrument for the explo-ration of structure of misconceptions about mechanicsusing MMA.

II. PREVIOUS STUDIES OF THE FMCEA. General Analyses

Multiple subdivisions of the FMCE have been sug-gested. Thornton and Sokoloﬀ introduced four subgroupsof items with the original publication of the instrument:“Force Sled” items, “Cart on a Ramp” items, “CoinToss” items , and “Force Graph” items [2] as describedabove. Items 5, 6, and 15 were identiﬁed as potentiallyproblematic leading to modiﬁed subgroups: “Force Sled”items (items 1-4 and 7) and “Force Graph” items (items14 and 16-21).Using data collected after the instrument’s publication,Thornton et al. proposed an alternate scoring schemewhich eliminated some items and scored some groupsof items (clusters) together [26]. The alternate scoringscheme for the clusters suggested item groups 8-10, 11-13, and 27-29 be scored together because students hadnot mastered the concept tested by the group unless theyanswered each item in the group correctly. Each clusterreceived two points if all items were answered correctly,zero points if not. They also suggested the eliminationof items 5, 15, 33, 35, 37, and 39 because students with-out an understanding of Newtonian mechanics often an-swered them correctly. They also suggested the elimina-tion of item 6 because content experts often answered itincorrectly.Multiple authors proposed other revisions to the sub-groups of items initially introduced by Thornton andSokoloﬀ. Wittmann identiﬁed ﬁve subgroups: “Force(Newton I and II)” (items 1-4, 7-14, 16-21), “Acceler-ation” (items 22-29), “Newton III” (items 30-32, 34, 36,38), “Velocity” (items 40-43), and “Energy” (items 44-47) [27]. These subgroups were further reﬁned using aresource framework by Smith and Wittmann who pro-posed a set of seven subgroups: “Force Sled” (items 1-4, 7), “Reversing Direction” (items 8-13, 27-29), “ForceGraphs” (items 14, 16-21), “Acceleration Graphs” (items22-26), “Newton III” (items 30-32, 34, 36, 38), “Veloc-ity Graphs” (items 40-43), and “Energy” (items 44-47)[27]. The problematic items identiﬁed by Thornton et al. were eliminated from all subgroups in these two studies.More recently, Smith, Wittmann, and Carter applied therevised subgroup structure to understand of the eﬀect ofinstruction [28].

B. Exploratory Analyses

Many studies have applied quantitative analysis meth-ods to explore the structure of conceptual physics instru-ments. A substantial number of studies have explored thefactor structure of the FCI, generally ﬁnding inconsistentor unintelligible results [6, 7, 11, 12, 29].Only two studies have performed factor analysis onthe FMCE. Ramlo examined the reliability of the FMCEusing a sample of 146 students [30] ﬁnding adequate reli-ability on the pretest (Cronbach’s α = 0 . α = 0 . α = 0 .

66 to α = 0 . C. Gender and the FMCE

On mechanics conceptual inventories (the FCI and theFMCE), men, on average, outperform women by 13% onthe pretest and 12% on the post-test [33]. The major-ity of research into the “gender gap” in PER analyzesdiﬀerences between men and women on the FCI; how-ever, some studies have explored these diﬀerences on theFMCE.Researchers have explored various factors that couldexplain the diﬀerences between men and women onthe FMCE. For example, diﬀerences in academic back-grounds and preparation, measured by FMCE pretestand math placement exam scores, have been shown toexplain much of the gender gap on the FMCE post-test[34, 35]. Studies have also investigated the impact ofinteractive-engagement on the overall gender gap. Al-though some studies have shown a positive impact by re-ducing the diﬀerences between men and women on con-ceptual inventory scores [34, 36, 37], other researchershave demonstrated that the gender gap for studentsenrolled in an interactive-engagement classroom is un-changed [38].While many studies have focused on the overall aver-age gender diﬀerences on the FMCE, recently, researchershave explored the fairness in the individual items on theFMCE [39]. An item is fair if men and women of equaloverall ability with the material score equally on theitem. Applying the modiﬁed scoring method proposedby Thornton et al. [26], only item cluster 27-29 scoredas a single item consistently showed substantial unfair-ness in multiple samples; this item was unfair to men. Inone of the two samples, item 40 demonstrated substan-tial gender unfairness; this item was also unfair to men.These results were substantially diﬀerent from the anal- ysis performed by Traxler et al. which identiﬁed a largenumber of unfair items on the FCI; most of the itemsitems were unfair to women [40].

D. The FCI and the FMCE

While both the FCI and the FMCE measure an un-derstanding of Newtonian mechanics, the FCI includesa substantially broader coverage of the topic. The FCIincludes two-dimensional kinematics and circular motionwhile the FMCE does not. Thornton et al. [26] quanti-ﬁed this diﬀerence in coverage noting that 22 of the 30FCI items were outside the coverage of the FMCE.The optimal model presented in Study 2 and a simi-lar study of the FCI [12] provide further evidence for thediﬀerence in coverage of the two instruments with theoptimal model of the FCI requiring 19 principles (funda-mental reasoning steps) while the optimal model of theFMCE required only 8 principles. The two instrumentsalso diﬀered starkly in their re-use of principles with theFCI rarely repeating the same set of principles on multi-ple items and the FMCE often repeating the same princi-ples. Study 2 also provided partial support for Thornton et al. [26] identiﬁcation of problematic items with items5, 6, 33, 35, and 37 having relatively small discrimina-tions and item 15 having negative discrimination. Themodels in Study 2 also suggest items 20 and 21 may notbe appropriately grouped with the other items probinggraphical interpretation of forces.

III. THE STRUCTURE OF KNOWLEDGE

The MMA algorithm detects sets of incorrect answersthat are commonly selected together by multiple stu-dents. Study 1 showed that, for the FCI, these incor-rect answer communities were related to either miscon-ceptions proposed by the authors of the FCI or to thepractice of blocking items. The reason students answerphysics questions incorrectly is a broad area of researchand multiple frameworks have been developed to explainincorrect answering.

A. Knowledge Frameworks

Much of the early work in PER conceptualized pat-terns of incorrect answers as “misconceptions,” coher-ently applied incorrect reasoning often related to Aris-totelian or medieval theories of nature. Early research in-vestigated common student diﬃculties in applying New-tonian mechanics [41–47]. As the ﬁeld evolved, system-atic studies were developed to explore student under-standing and epistemology [2, 48–51].Eventually, alternate frameworks not involving mis-conceptions were proposed. Two of the most prominentframeworks are knowledge-in-pieces [52, 53] and ontolog-ical categories [54–56]. Knowledge-in-pieces models stu-dent thinking as resulting from the application of a setof granular pieces of reasoning which are used indepen-dently or collectively to solve problems. Multiple authorshave investigated this model and these reasoning pieceshave been called phenomenological primitives (p-prims)[52, 53], resources [57–59], and facets of knowledge [60].In the knowledge-in-pieces model, misconceptions rep-resent consistently activated p-prims. Unlike the mis-conception view, the knowledge-in-pieces model views p-prims as potentially positive resources than can be acti-vated as part of the process of constructing knowledge.For a careful and accessible exploration of the rela-tion of and diﬀerences between the misconception viewand the knowledge-in-pieces framework, see Scherr [61];the current study applies the deﬁnitions from this work.The misconception model is deﬁned as “a model ofstudent thinking in which student ideas are imaginedto be determinant, coherent, context-independent, sta-ble, and rigid” [61]. The knowledge-in-pieces frame-work models student ideas “as being at least poten-tially truth-indeterminate, independent of one another,context-dependent, ﬂuctuating, and pliable” [61].The ontological category framework diﬀers substan-tially from either the misconception view or theknowledge-in-pieces view. The ontological categoryframework models incorrect reasoning as resulting for anincorrect classiﬁcation of a concept. For example, mis-classifying force as a quantity that can be used up whichmight lead a student to believe an object would slowwhen the applied force was removed.

B. Misconceptions

The FCI was developed using the misconceptionsmodel; Hestenes, Wells and Swackhamer proposed a de-tailed taxonomy of the misconceptions measured by theinstrument [1]. The taxonomy was developed from qual-itative studies investigating students’ “alternate view ofthe relationship between force and acceleration” whereresearchers interviewed students about their diﬃcultieswhile solving conceptual physics problems [62–64]. Theauthors of the FCI provided a detailed description of themisconceptions measured by the instrument [1]; this tax-onomy was later reﬁned by Hestenes and Jackson [24].The analysis in the current work demonstrates that theFMCE probes a limited number of the misconceptionsthat were originally outlined by the authors of the FCI;only these misconceptions are described below. For moreinformation about the other misconceptions probed bythe FCI, see Study 1.

Velocity-Acceleration Undiscriminated.

The misconcep-tion of velocity-acceleration undiscriminated stems fromthe concept of “motion is vague” [1]. This misconceptiondemonstrates the inability to diﬀerentiate the concepts of position, velocity, and acceleration within kinematics.For example, items 22-26 on the FMCE refer to a carmoving on a horizontal surface and ask for the acceler-ation as a function of time. The velocity-accelerationundiscriminated misconception would predict that whenthe car is speeding up or slowing down at a constant rate,the graph would show a linear trend of acceleration withrespect to time and when the car is traveling at a con-stant velocity, the graph would show a non-zero constantacceleration.

Motion Implies Active Forces.

The motion implies activeforces misconception is one of the sub-categories outlinedunder the “Active Forces” category of misconceptions de-scribe by the authors of the FCI [1]. This misconceptionimplies that an object in motion, even if moving at con-stant velocity, will experience a force in the direction ofmotion; it demonstrates that Newton’s 2nd law is not wellunderstood. For example, items 1-4 on the FMCE probethis misconception; a sled is being pushed along the iceand students are asked to describe the force which wouldkeep the sled moving. The motion implies active forcesmisconception would predict that force is proportional tovelocity rather than acceleration.

Action/Reaction Pairs.

The misconceptions of greatermass implies greater force and the most active agent pro-duces the greatest force are the two sub-categories withinthe “Action/Reaction Pairs” group of student diﬃculties.This group of misconceptions implies that Newton’s 3rdlaw is not well understood. For example, FMCE items30-32 probe these misconceptions by describing collisionsbetween a heavy truck and a small car. The greater massimplies a greater force misconception would predict thatthe heavy truck would exert a greater force on the smallcar than the small car would on the heavy truck. Themost active agent produces the greatest force would pre-dict that the object that is moving the fastest would pro-duce the greatest force.

IV. METHODSA. Sample

The sample was collected at a large eastern land-grant university serving approximately 30,000 students.The demographics of the undergraduate population atthe university were 80% White, 6% International, 4%African-American, 4% Hispanic, 2% Asian, 4% two ormore races, and other groups less than 1% [65]. Thegeneral undergraduate population had a range of ACTscores from 21-26 (25th to 75th percentile).The data were collected in the introductory calculus-based mechanics course from Spring 2011 to Spring 2017.The majority of the students enrolled in this course werephysical science and engineering majors. This samplewas previously analyzed in Henderson et al. (Sample 3A[39]) where the instructional environment is described indetail. The course was taught by multiple instructorsand generally featured an interactive pedagogy in lectureand laboratory.Over the period studied, the FMCE was given at thebeginning and at the end of the class in each semester.The sample contains 3956 FMCE pretest responses and3719 FMCE post-test responses (each with 80% men);only the students who completed the course for a gradewere included in the study. The overall pretest to post-test gains for men and women were 28% and 21%, respec-tively. The descriptive statistics for the FMCE pretestand the FMCE post-test are presented in Table II inHenderson et al. (Sample 3A) [39].

B. Analysis Methods

This work applies Modiﬁed Module Analysis (MMA)described in Study 1 to the FMCE. Although the methodis described in detail in Study 1 [21], we provide anoverview of the method here.All responses to the FMCE where dichotomously codedwhere response 1D i would be coded as one if student i selected the response and zero otherwise. The correct re-sponses were eliminated; network analysis is unproduc-tive if the correct responses are included because theyform a single tightly connected community that hides thestructure of the incorrect answers. Responses that wereselected by fewer than 5% of the students were eliminatedas statistically unreliable.The correlation matrix was calculated for the remain-ing incorrect answers. This correlation matrix deﬁnes anetwork with nodes representing the incorrect responsesand weighted edges between the nodes representing thestrength of the correlation between the two responses.Edges that represent correlations that were not signiﬁ-cant at the α = 0 .

05 level with a Bonferroni correctionapplied were eliminated. The network was further sim-pliﬁed by eliminating any correlation where r < .

2; thiswas the threshold applied in Study 1. This also servedto remove the large negative correlations between two re-sponses to the same item. Network analysis often usesmethods to simplify the network while retaining impor-tant structure; this process is called “sparsiﬁcation.”A community detection algorithm was then applied todetect structure in the network. Study 1 applied the“fast-greedy” algorithm [66] included in the “igraph”package [19] for R. Many community detection algo-rithms exist; Study 1 reported that most produced sim-ilar results for the correlation network. The fast-greedyalgorithm is designed to maximize the modularity of thedivision of the network into uniﬁed subnetworks. Mod-ularity measures the number of intra-community edgesin a particular division of the network compared to thenumber expected in a random division.To account for randomness in both the sample andthe algorithm, 1000 bootstrapped replications were car-ried out. As a result, 1000 divisions of the network into communities were calculated sampling the data with re-placement. For each pair of incorrect items, the numberof times the two items appeared in the same communitywas calculated. This number is divided by the numberof bootstrap replications to form the community fraction C . In this study, we analyzed communities that wereidentiﬁed in 80% of the 1000 bootstrapped samples.Because the incorrect answer communities of men andwomen are compared and the number of men in the sam-ple is signiﬁcantly larger than the number of women, carewas taken to produce a balanced sample. For men, thedata were downsampled to the size of the female dataset.For women, the dataset was sampled with replacementpreserving the size of the dataset. V. RESULTS

Modiﬁed Module Analysis was applied to the FMCE;the communities identiﬁed are shown in the ﬁrst table inthe Supplemental Materials [67]. Retaining nodes whereat least 5% of the students selected the response (approx-imately the threshold used in Study 1) produced 35 com-munities. These communities were often formed of smallsubsets of item groups identiﬁed in previous studies. Thiswas dramatically diﬀerent than the small number of com-munities identiﬁed in the FCI by Study 1. The complexnature of the communities identiﬁed made understandingtheir structure diﬃcult.To produce a simpler structure more open to inter-pretation, the network was further sparsiﬁed retainingonly nodes selected by 20% of the students. The com-munity structure of this network is shown in Table I. Innearly every case, the communities form completely dis-connected, complete graphs. The intra-community den-sity measures the connectivity of a community and isdeﬁned as γ = 2 m/n ( n − n is the number ofnodes and m is the number of realized edges. A fullyconnected community has an intra-community density ofone.Table I oﬀers partial support for the identiﬁcation ofitems 5, 6, 15, 33, 35, 37, and 39 as problematic in Thorn-ton et al. [26]. Items 20 and 21 were modeled as havinga diﬀerent solution structure to other items in the “ForceGraph” group in Study 2; these items are inconsistentlyconnected to the other items in this group in Table I.Incorrect answers to items 15, 33, and 37 were neveridentiﬁed as part of a community. Incorrect answers toitems 20, 21, 35, and 39 were inconsistently identiﬁedas parts of the communities associated with the items inthe group. As such, some of the complexity in Table Iresults from these items. If items 5, 6, 15, 20, 21, 33, 35,37, and 39 are eliminated from the analysis, the struc-ture of Table I simpliﬁes substantially to produce TableII. The communities in Table II are shown graphically inFig. 1.The sets of items in Table I and II generally conformto the item groups identiﬁed in previous works and dis- Table I. Communities identiﬁed in the pretest and post-test incorrect answers at r > . C > . γ , forcommunities where the intra-community density is not one. Newton III* denotes that this community does not contain 31F.Community Pretest Post-test ItemMen Women Men Women Group1A, 2B, 3C, 4G, 5B, 6C, 7E X X X Force Sled1A, 2B, 3C, 4G, 5B, 6C, 7E, 14A, 16C, 17B, 18H, 19D, 20F X( γ = 0 .

88) Force SledForce Graph8G, 9D, 10B, 11G, 12D, 13B X X Cart on a RampCoin Toss - Force8G, 9D, 10B, 11G, 12D, 13B, 27G, 28D, 29B X X Cart on a RampCoin Toss - ForceCoin Toss - Acceleration14A, 16C, 17B, 18H, 19D X X X Force Graph22E, 23G, 24B, 25F, 26A, 27G, 28D, 29B X Acceleration GraphsCoin Toss - Acceleration22E, 23G, 24B, 25F, 26A X X X Acceleration Graphs27G, 28D, 29B X Coin Toss - Acceleration30A, 31F, 32B, 34B, 36C, 38B X X X X Newton III cussed in Sec. I A. Table II suggests items 27-29 should betreated as an independent group; we propose this groupbe called “Coin Toss - Acceleration” to distinguish it fromitems 11-13 which becomes “Coin Toss - Force.” Bothsets of items ask about a coin tossed in the air; items11-13 ask about the force on the coin, items 27-29 aboutthe acceleration. Smith and Wittmann combined theseitems into a “Reversing Direction” (items 8-13, 27-29) group; MMA suggests this grouping may not be appro-priate for all students. We also note that Smith andWittmann’s “Velocity Graphs” (items 40-43) group doesnot appear. This group had relatively poor Cronbachalpha when used as a subscale in Study 2.At this level of sparsiﬁcation, for each item only a sin-gle response appeared in each community, indicating thatthere is a single, dominant incorrect answer that students

1A 2B3C4G7E 8G9D 10B11G 12D13B 14A16C17B18H 19D22E23G 24B25F 26A27G 28D29B 30A31F32B 34B36C38BPretest Men 1A2B 3C4G7E8G 9D10B 11G12D13B 14A16C 17B18H19D22E 23G24B 25F26A27G 28D29B 30A 31F32B 34B36C 38BPretest Women1A 2B3C4G7E 8G9D 10B11G 12D13B14A16C 17B18H19D 22E23G24B25F 26A27G28D29B30A32B34B 36C38B Post-test Men 1A 2B3C4G 7E8G 9D10B 11G12D 13B 14A16C17B 18H19D22E23G 24B25F26A27G28D 29B 30A32B 34B36C38BPost-test Women

Figure 1. Communities identiﬁed in the FMCE pretest and post-test for men and women. tend to select. This was consistent between the pretestand the post-test and by gender.

A. The Structure of Incorrect FMCE Responses

Study 2 allows the description of the physical principlestested by each item group. Both “Force Sled” and “ForceGraph” test a combination of Newton’s 1st and 2nd lawand the deﬁnition of acceleration. The “Force Graph”items also require the use of graphical reasoning. The“Cart on a Ramp,” “Coin Toss - Force,” and “Coin Toss -Acceleration” groups each require the law of gravitation,that the gravitational force is downward and constant.The “Acceleration Graphs” group requires the deﬁnition of acceleration and reading a graph. The “Newton III”group requires Newton’s 3rd law.In addition to the communities being strongly relatedto the item groups, often multiple item groups testing thesame physical principles were part of the same commu-nity. Much of the complexity of Table II results from theinconsistent joining of incorrect answers to items testingthe same concept. Table III summarizes the item groups,the physical principle tested by the group, and the com-mon misconception selected for the group.The misconceptions represented by the items in theincorrect communities are quite consistent. As in Study1, we use Hestenes and Jackson’s extensive taxonomyof misconceptions measured by the FCI to classify themisconceptions [24]. The “Force Sled,” “Force Graph,”

Table III. Item groups, the physical principle tested by the group, and the common misconception selected by the students.Item Group Community Physical Principle MisconceptionForce Sled 1A, 2B, 3C, 4G, 7E Newton’s 1st and 2nd law Motion implies active forcesCart on a Ramp 8G, 9D, 10B Motion under gravity Motion implies active forcesCoin Toss - Force 11G, 12D, 13B Motion under gravity Motion implies active forcesForce Graph 14A, 16C, 17B, 18H, 19D Newton’s 1st and 2nd law Motion implies active forcesAcceleration Graphs 22E, 23G, 24B, 25F, 26A Deﬁnition of acceleration Velocity-acceleration undiscriminatedCoin Toss - Acceleration 27G, 28D, 29B Motion under gravity Velocity-acceleration undiscriminatedNewton III 30A, 31F, 32B, 34B, 36C, 38B Newton’s 3rd law Greater mass implies greater forceMost active agent produces greatest force “Coin Toss - Force,” and “Cart on a Ramp” responses allrepresent the motion implies active forces misconception;all select a force proportional to the velocity. The “Accel-eration Graphs” and “Coin Toss - Acceleration” groupsboth represent the velocity-acceleration undiscriminatedmisconception; all select an acceleration proportional tovelocity.Study 1 found that the FCI presented the studentswith two misconceptions related to Newton’s 3rd law:greater mass implies greater force and most active agentproduces greatest force. MMA was unable to disentanglethe application of these two misconceptions for the FCI.Both misconceptions are also in the same community forthe FMCE. Item 30A represents the greater mass im-plies greater force misconception. Items 32B, 34B, 36C,38B apply the most active agent produces greatest forcemisconception. Interestingly, item 31 gives the studenta situation where both misconceptions apply, a head-on collision between a large truck and a faster movingcar. Response 31F indicates the student does not believethey have enough information to solve the item suggest-ing they are indeed trying to apply both misconceptionssimultaneously.

B. Gender Diﬀerences in Community Structure

Both men and women consistently answer incorrectlyto the “Force Sled” and “Force Graph” items on thepretest. The physical principles needed to solve theseitems are very similar, but the responses to the “ForceSled” items are textual whereas the responses to the“Force Graph” items are graphical. This seems to in-dicate that the representation chosen for the answer af-fects the application of the misconception on the pretestfor both men and women. These item groups continue tobe diﬀerent communities for women on the post-test; formen, they have generally merged ( γ = 0 .

88) into a singlecommunity on the post-test.Men and women also diﬀer in their application ofmisconceptions to items involving motion under gravity:“Cart on a Ramp” items, “Coin Toss - Force” items, and“Coin Toss - Acceleration” items. These items form asingle community on both the pretest and post-test for women. For men, the Coin Toss - Acceleration items arein a diﬀerent community on both the pretest and post-test. These three groups do apply diﬀerent misconcep-tions with “Cart on a Ramp” and “Coin Toss - Force”items applying a force proportional to velocity miscon-ception while the “Coin Toss - Acceleration” items applyan acceleration proportional to velocity misconception.If a student understands that force and acceleration areproportional, then these two misconceptions should pro-duce the same results. The pattern of community mem-bership seems to indicate women apply both misconcep-tions consistently, while men do not.While most communities make theoretical sense, bothin terms of the item group suggested for the instrumentand the physical principles required to solve items in thegroup identiﬁed in Study 2, one does not. For men, onepretest community combines “Acceleration Graphs” with“Coin Toss - Acceleration.” These items require verydiﬀerent physical reasoning for their correct solution,but apply the same misconception, velocity-accelerationundiscriminated. For these items, the misconception ismore important in determining the community than thecorrect answer structure.

C. The Strength of Common Misconceptions

One potential application of these results is to pro-vide classroom instructors with a measurement of howstrongly a misconception is held by their students. Theinstructor could then tailor his or her instruction to em-phasize material on those subjects. The strength ofa misconception community, called the “misconceptionscore,” is deﬁned as the fraction of items within the com-munity that are selected by the student. For example,if a community contains { } ,a student who selected 22E, 24G, and 26A would havea misconception score of sixty percent, while a studentwho selected all ﬁve answer choices would have a scoreof one-hundred percent. A higher score indicates a morestrongly held misconception. A student who answereditems 22, 23, 24, 25, and 26 correctly would have a mis-conception score of zero percent.The Mann-Whitney U test [69] was used to determine0 Table IV. Percentage of students selecting each incorrect community for the FMCE post-test; mean, 1st quartile (1Q), median(med), and 3rd quartile (3Q). A Mann-Whitney U test was performed to determine if the diﬀerences between men and womenwere signiﬁcant, the p -value is presented. The eﬀect size is given as Vargha and Delaney’s A [68], the probability that arandomly selected woman will score higher than a randomly selected man.Community Men Women p A (%) MisconceptionMean 1Q, Med, 3Q Mean 1Q, Med, 3Q (%)Force Sled, Force Graph 48 10 , ,

80 59 40 , , < .

001 59 Motion implies active forcesCart on a Ramp 48 0 , ,

83 59 33 , , < .

001 61 Motion implies active forcesCoin Toss - ForceAcceleration Graphs 27 0 , ,

60 35 0 , , < .

001 56 Velocity-acceleration undiscriminatedCoin Toss - Acceleration 30 0 , ,

67 44 0 , , < .

001 62 Velocity-acceleration undiscriminatedNewton III 43 0 , ,

80 46 0 , ,

80 0 .

07 52 Greater mass implies greater forceMost active agent produces largest force if the misconception scores were signiﬁcantly diﬀerentfor men and women on the post-test because the datawere highly non-normal and discontinuous. The Mann-Whitney U test is a non-parametric test that may be usedinstead of the unpaired t-test. In this sample, the overallpost-test score was higher for men than women: the me-dian number of incorrect responses was 20 for men and26 for women. The eﬀect size of this diﬀerence, measuredusing Vargha and Delaney’s A statistic [68], was small:0.63. This indicates that a randomly selected female stu-dent will have more incorrect answers than a randomlyselected male student 63% of the time. If there were no ef-fect, A would be 0.50, reﬂecting a 50-50 chance of a scorefrom either group being higher. The small, medium, andlarge eﬀect sizes for Cohen’s d correspond to values ofVargha and Delaney’s A greater than 0.56, greater than0.64, and greater than 0.71, respectively.Table IV presents the A statistic, the mean, 1st quar-tile (1Q), median (Med.), and third quartile (3Q) formen and women for the misconception scores for eachincorrect answer community. While the Mann-Whitney U test found a signiﬁcant diﬀerence in each case, all ofthe A values were in the small or negligible eﬀect sizerange. Furthermore, all of the A values were lower thanthe overall chance of selecting a female student at ran-dom with more incorrect answers than a random malestudent. This is consistent with the ﬁnding in Study 1showing while signiﬁcant diﬀerences exist between themisconception scores of men and women, that these dif-ferences are largely explained by overall diﬀerences in thepost-test scores of men and women.For the class studied, students hold the motion impliesactive forces and the Newton’s 3rd law misconceptionsmore strongly than the velocity-acceleration undiscrimi-nated misconception. D. Reducing Sparsiﬁcation

Sparsiﬁcation is a network analytic term for remov-ing edges from a network to reduce its density. In MMA, sparsiﬁcation is accomplished by removing nodes selectedby a small number of students and edges correlated be-low some threshold ( r < . VI. DISCUSSIONA. Research Questions

This study sought to answer four research questions;the ﬁrst three will be addressed in the order proposed.The fourth research question compares the results ofStudy 1 for the FCI to the results of this study. The dif-ferences of the FCI and FMCE will be discussed as partof the answer to each of the ﬁrst three research questions.

RQ1: What incorrect answer communities are identi-ﬁed by Modiﬁed Module Analysis in the FMCE?

Thecommunities of incorrect responses identiﬁed on theFMCE generally conformed to the block structure of theinstrument and were associated with items groups iden-tiﬁed in previous work. This discussion will focus on theanalysis retaining nodes selected by 20% of the students;results retaining nodes selected by 5% and 10% of the stu-dents are discussed in RQ3. Modiﬁed Module Analysisshowed the item groups proposed by Smith and Wittmanwere being consistently answered using a common mis-conception: the “Force Sled” (items 1-4, 7), the “ForceGraph” (items 14, 16-19), “Acceleration Graphs” (items22-26) and “Newton III” (items 30-32, 34, 36, 38) [27].The “Reversing Direction” subgroup of items (items 8-10,11-13, 27-29) [27] was not consistently identiﬁed as an in-correct answer community. The subgroup of items 27-29sometimes formed its own community and was sometimesgrouped with the other items. We proposed renamingthe subgroups: “Cart on a Ramp” (items 8-10), “CoinToss-Force” (items 11-13), and “Coin Toss-Acceleration”(items 27-29). “Cart on a Ramp” and “Coin Toss -Force” items were identiﬁed in the same community bothpre- and post-instruction and for men and women; “CoinToss - Acceleration” items were inconsistently identiﬁedas part of this community.Only four misconceptions were identiﬁed retainingnodes selected by the 20% of the students: motion impliesactive forces, velocity-acceleration undiscriminated, andtwo Newton’s 3rd law misconceptions. The Newton’s 3rdlaw misconceptions, greater mass implies greater forceand most active agent produces largest force, were notidentiﬁed as independent incorrect answer communities.This is consistent with Study 1 which also failed to dis-tinguish the two misconceptions in the FCI. Also con- sistent with Study 1, the incorrect answer communitiescontained items testing the same physical principles asidentiﬁed in Study 2. The physical principle tested by theitem, rather than the misconception, was the most im-portant factor in determining the incorrect answer com-munity. In this study, four separate item groups wereassociated with the motion implies active forces miscon-ception (Table III): “Force Sled,” “Force Graph,” “Carton a Ramp,” and “Coin Toss - Force.” Study 2 showedthat the ﬁrst two groups required Newton’s 1st and 2ndlaw for their solution while the last two required the lawof gravitation. While testing the same misconception, theﬁrst two groups were never detected in the same commu-nity as the last two groups. This is consistent with Study1 which also identiﬁed multiple incorrect answer commu-nities in the FCI measuring the motion implies activeforces misconception; these communities also had similarcorrect solution structure [12].Study 2 demonstrated that the FMCE has substan-tially less complete coverage of mechanics than the FCIwhich was consistent with previous work by Thornton etal. [26]. The FCI also measures a broader set of miscon-ceptions than the FMCE. Communities associated with9 diﬀerent misconceptions were identiﬁed in the FCI,while only 4 were identiﬁed in the FMCE. While coveringfewer misconceptions, the FMCE does measure the crit-ical velocity-acceleration undiscriminated misconceptionmore thoroughly than the FCI. Responses 19A, 20B, and20C in the FCI are reported to measure this misconcep-tion in Hestenes and Jackson [24], but were not detectedas an incorrect answer community in Study 1.Study 1 also identiﬁed 3 communities in the FCI thatdirectly resulted from the blocked structure of the instru-ment. In these communities, the second item in an itemblock was the correct answer if the ﬁrst answer had beenthe correct answer. No such communities were identiﬁedin the FMCE. While extensively blocked, the items inthe FMCE do not directly refer to the results of previousitems.The communities identiﬁed in the FMCE were gener-ally substantially larger than those identiﬁed in the FCI.The FCI contained 13 distinct communities for a 30-iteminstrument while the FMCE contained 9 communities fora 43-item instrument. In the FMCE, some of the dis-tinct communities resulted from joining other communi-ties. All communities in the FMCE can be formed of6 groups of items: “Force Sled,” “Force Graph,” “Ac-celeration Graphs,” “Coin Toss - Acceleration,” “New-ton III,” and a community that combines “Cart on aRamp” and “Coin Toss - Force.” As such, substantiallyfewer distinct groups of misconceptions are identiﬁed inthe FMCE; however, the groups were often substantiallylarger in the FMCE than the FCI. For the FMCE, thefundamental groups have sizes ranging from 3 to 6 withall but one group containing at least 5 items. Only 2 ofthe 13 groups in the FCI contain as many as 3 items with11 groups containing only two items. Because the incor-rect answer communities contain more items, the FMCE2

Table V. Misconceptions represented by communities identiﬁed in items selected by at least 10% of the students which were notidentiﬁed in items selected by at least 20% of the students. Items marked * do not have an obvious relation to the misconception.Misconceptions identiﬁed by Hestenes and Jackson [24] are bolded.Community Misconception3D, 7D No force is required to slow an object.3E, 7C To slow an object at a constant rate, a decreasing forceopposite motion must be applied.3G, 7A To slow an object at a constant rate, an increasing forceopposite motion must be applied.8E, 11E, 27E Gravity exerts a constant force in the direction of motion.8F, 11F, 27F Gravity exerts an increasing force in the direction of motion.8F, 10C, 11F, 13C, 27F, 29C Gravity exerts an increasing force as an object travels upwardand a decreasing force as it travels downward.8F, 10C, 11F Gravity exerts an increasing force as an object travels upwardand a decreasing force as it travels downward.11E, 27E Gravity exerts a constant force in the direction of motion.14C, 17H, 24G, 26E, 40D, 42C, 43A* Force-acceleration-velocity undiscriminated from position.14C, 17D, 17H, 23D*, 24G, 26E, 40D, 42C, 43A* Force-acceleration-velocity undiscriminated from position.14C, 17D, 40D, 42C Force-velocity undiscriminated from position.14C, 17D, 17H, 40D, 42C, 42H* Force-velocity undiscriminated from position.17A, 18D, 19C, 19H, 23F, 24A, 25E, 25G

Velocity proportional to applied force.

Velocity-acceleration undiscriminated. may provide a substantially more accurate characteriza-tion of the strength of the misconception (Table IV) thanthe FCI.The MMA method also provided support for elimi-nating the problematic items which were identiﬁed byThornton et al. [26]. With items 5, 6, 15, 20, 21, 33,35, 37 and 39 included in the analysis, the communitystructure was complex which made it rather diﬃcult tointerpret because some of these items were inconsistentlyassociated with a misconception community.

RQ2: How are these communities diﬀerent pre- andpost-instruction? How is the community structure diﬀer-ent for men and women?

The pre- and post-instructiondiﬀerences of the community structure were very diﬀer-ent for men and women, and as such, these two questionswill be addressed together. The communities identiﬁedfor men and women were often diﬀerent; on the FMCEpretest, only three out of the nine communities were thesame, while on the FMCE post-test, two out of the ninewere the same. The diﬀerences were generally the resultof joining two communities with similar correct solutionstructure as identiﬁed in Study 2. Men integrated the“Force Sled” and “Force Graph” item groups on the post- test while women did not; however, women integrated the“Coin Toss - Acceleration” item group with the “Cart ona Ramp” and “Coin Toss - Force” item groups on thepost-test while men did not. As such, neither men norwomen were more likely to form more integrated miscon-ceptions with instruction. The same physical reasoningis required to solve the items in the larger integratedmisconception groups and, therefore, more consistencyin selecting a misconception may represent progress inrecognizing the same reasoning is required by the items.The diﬀerence between men and women both pre- andpost-instruction was dramatically diﬀerent than the re-sults of Study 1 for the FCI. Generally, the incorrect an-swer community structure was very similar for men andwomen on both the pretest and the post-test for the FCI.The change in misconception structure between thepretest and the post-test was dramatically diﬀerent formen and women. For women, the misconception com-munities identiﬁed were completely consistent from thepretest to the post-test. For men, of the ﬁve commu-nities identiﬁed pre-instruction, only two were identi-ﬁed post-instruction. The diﬀerences resulted from the“Force Graph” and “Force Sled” communities merging3post-instruction, possibly indicating that men developedmore facility with working with the same type of prob-lem in multiple representations with instruction. Pre-instruction, the “Acceleration Graphs” and “Coin Toss- Acceleration” item groups were combined; these wereseparate post-instruction. These groups require diﬀerentphysical principles for their solution; however, both ap-ply the same misconception. This may possibly indicatethat men diﬀerentiate the ideas of force and accelerationin an inconsistent manner pre-instruction.These results also help to explain the unfairness thatwas identiﬁed in items 27-29 by Henderson et al. [39].Women consistently integrated this item group (“CoinToss - Acceleration”) with the other item groups measur-ing motion under gravity (“Cart on a Ramp” and “CoinToss - Force”); men did not. “Coin Toss - Force” and“Coin Toss - Acceleration” items diﬀer only by askingabout the force and acceleration on a coin moving un-der the force of gravity; failing to integrate the miscon-ceptions about force and acceleration seems to indicateeither that the student does not understand that forceand acceleration are proportional or indicate some errorin interpreting the items.The strength of the misconception, measured by themisconception score in Table IV, shows how stronglystudents hold a particular misconception. The miscon-ception score was smaller than the overall diﬀerence inFMCE score between men and women showing there arenot particular misconceptions more strongly held by menor women. No gender diﬀerence in misconception scorewas larger than a small eﬀect.

RQ3: How do the communities change as the parame-ters of the MMA algorithm are modiﬁed?

Study 1 investigated variations in two network buildingparameters: the correlation threshold r and the commu-nity fraction C . These parameters were adjusted to pro-duce productive community structure using the model ofthe correct solution structure provided in Study 2 andthe taxomony of misconceptions provided by Hestenesand Jackson [24]. The threshold of the minimum numberof students who could select a response was not investi-gated because productive structure was identiﬁed retain-ing only responses selected by a least 30 students, theminimum statistically viable threshold. The FMCE be-haved diﬀerently; the misconception structure changeddramatically as the threshold for the minimum percent-age of students selecting a response was modiﬁed.Retaining nodes selected by at least 5% of the stu-dents, MMA identiﬁed 35 incorrect response communi-ties; many of these communities were similar, with somediﬀering by only a single response. Retaining responsesselected by at least 10% of the students, the structure ofthe communities was still complex (Table V) but, in gen-eral, a single coherent misconception could be identiﬁedfor each community. Some, but not all, of these mis-conceptions were described in the taxonomy proposed byHestenes, Wells, and Swackhamer [1, 70] and reﬁned byHestenes and Jackson [24]. If responses selected by a minimum of 20% of the stu-dents were retained, the community structure simpliﬁedsubstantially (Table I). Examination of the communitystructure showed that much of the remaining complex-ity involved the sporadic inclusion of items identiﬁed asproblematic by Thornton et al. [26]. Removal of theseitems produced the relatively simple community struc-ture in Table II. With the exception of one male pretestcommunity, these communities all measured a miscon-ceptions described in Hestenes and Jackson’s taxonomy[24] as well as requiring the same physical reasoning de-scribed in Study 2. The male pretest community appliedthe same misconception, but required diﬀerent physicalreasoning for its correct solution.The FCI and the FMCE community structures weredramatically diﬀerent if responses selected by 5% of thestudents were retained. At this threshold, the FCI hadonly 13 small communities and the FMCE 35 often fairlylarge communities even though the coverage of the FCIis substantially more broad than the FMCE. These dif-ferences likely resulted from two sources: students in theFCI sample scored substantially higher on the instrumentthan the students in the FMCE sample and the unusualdistractor structure of the FMCE. The FCI uses only 5responses for each question and the incorrect responseswere developed from student interviews and include com-mon student incorrect views. The FMCE uses items withmore than 5 responses that often generally exhaust thepossible responses. This oﬀers far greater latitude for stu-dents to express uncommon misconceptions and, there-fore, are only selected by a small fraction of the students.The broad set of misconception communities identiﬁedretaining nodes selected by 10% of the students suggestthat the state of student incorrect reasoning may be sub-stantially more complex than the structure measured bythe FCI. VII. IMPLICATIONS

The responses to the FCI were constructed to measurecommon misconceptions allowing Jackson and Hestenesto provide a detailed taxonomy of the misconceptionsmeasured by each item [24]. While common misconcep-tions were certainly considered in the construction of theinstrument, the FMCE presents students with many pos-sible incorrect answers. These answers largely exhaustthe possible responses. As such, the FMCE may be amuch better instrument for a purely exploratory analysisof student incorrect thinking less tied to the misconcep-tion view.The identiﬁcation of incorrect answer communitiestesting the same misconception allows the calculationof a misconception score as a quantitative measure ofhow strongly the misconception is held. This should al-low instructors to determine which misconceptions aremost prevalent in their classes and to target instructionto eliminate these misconceptions.4

VIII. LIMITATIONS

The MAMCR and MMA algorithms require a num-ber of choices to be made by the researcher to producenetwork structure that is productive in furthering theunderstanding of a conceptual instrument. As the useof network analysis matures in PER, quantitative crite-ria for optimally selecting network parameters should bedeveloped.

IX. CONCLUSION

Physics conceptual inventories have played an im-portant role in quantitative physics education researchand understanding students’ diﬃculties with conceptualphysics continues to be a central research area withinPER. Network analysis, speciﬁcally Modiﬁed ModuleAnalysis (MMA), has recently been used as a tool toinvestigate the common misconceptions on the FCI [21].The current study replicated this work for the FMCE.In general, retaining responses selected by 20% of thestudents, the community structure for the FMCE wasconsistent with the item groups identiﬁed in previousstudies [2, 27]. The misconceptions represented by thesecommunities were limited: motion implies active forces,velocity-acceleration undiscriminated, greater mass im-plies greater force, and most active agent produces great-est force. Three of these incorrect answer communitieswere previously identiﬁed in the FCI [21]; however, the velocity-acceleration undiscriminated misconception wasonly detected as an incorrect answer community in theFMCE. The FCI was found to measure nine misconcep-tions in the previous study.The FCI and the FMCE behaved dramatically diﬀer-ently as network parameters were adjusted. For the FCI,including responses selected by 4% of the students, only13 communities were detected, most with only two re-sponses. Retaining responses selected by a similar per-centage of students, 35 communities were detected in theFMCE with up to 15 members.The evolution of the communities identiﬁed was dra-matically diﬀerent for men and women. The communi-ties identiﬁed for women did not change from pretest topost-test, while only 2 of the 5 communities identiﬁedfor men remained consistent. Unlike the FCI, there waslittle consistency in the communities identiﬁed for menand women either pre-instruction or post-instruction.Overall, Modiﬁed Module Analysis was productive inunderstanding the misconception structure of both theFCI and the FMCE and allowing the comparison of theinstruments.

ACKNOWLEDGMENTS

Data collection for this work was supported by Na-tional Science Foundation grants EPS-1003907 and ECR-1561517. [1] D. Hestenes, M. Wells, and G. Swackhamer, “Force Con-cept Inventory,” Phys. Teach. , 141–158 (1992).[2] R.K. Thornton and D.R. Sokoloﬀ, “Assessing stu-dent learning of Newton’s laws: The Force andMotion Conceptual Evaluation and the evaluationof active learning laboratory and lecture curricula,”Am. J. Phys. , 338–352 (1998).[3] D.P. Maloney, T.L. O’Kuma, C. Hieggelke, andA. Van Huevelen, “Surveying students’ concep-tual knowledge of electricity and magnetism,”Am. J. Phys. , S12–S23 (2001).[4] L. Ding, R. Chabay, B. Sherwood, and R. Beich-ner, “Evaluating an electricity and magnetism assess-ment tool: Brief Electricity and Magnetism Assessment,”Phys. Rev. Phys. Educ. Res. , 010105 (2006).[5] J.L. Docktor and J.P. Mestre, “Synthesis of discipline-based education research in physics,” Phys. Rev. Phys.Educ. Res. , 020119 (2014).[6] T.F. Scott, D. Schumayer, and A.R. Gray, “Exploratoryfactor analysis of a Force Concept Inventory data set,”Phys. Rev. Phys. Educ. Res. , 020105 (2012).[7] M.R. Semak, R.D. Dietz, R.H. Pearson, andC.W. Willis, “Examining evolving performance onthe Force Concept Inventory using factor analysis,”Phys. Rev. Phys. Educ. Res. , 010103 (2017).[8] P. Eaton and S.D. Willoughby, “Conﬁrmatory factor analysis applied to the Force Concept Inventory,” Phys.Rev. Phys. Educ. Res. , 010124 (2018).[9] C. Fazio and O.R. Battaglia, “Conceptual understandingof Newtonian mechanics through cluster analysis of FCIstudent answers,” Int. J. Sci. Math. Educ. , 1–21 (2018).[10] J. Wang and L. Bao, “Analyzing Force Con-cept Inventory with item response theory,”Am. J. Phys. , 1064–1070 (2010).[11] T.F. Scott and D. Schumayer, “Students’ proﬁciencyscores within multitrait item response theory,” Phys.Rev. Phys. Educ. Res. , 020134 (2015).[12] J. Stewart, C. Zabriskie, S. DeVore, andG. Stewart, “Multidimensional item responsetheory and the Force Concept Inventory,”Phys. Rev. Phys. Educ. Res. , 010137 (2018).[13] C. Zabriskie and J. Stewart, “Multidimensional item re-sponse theory and the Conceptual Survey of Electricityand Magnetism,” Phys. Rev. Phys. Educ. Res. , 020107(2019).[14] E. Brewe, J. Bruun, and I.G. Bearden, “Using mod-ule analysis for multiple choice responses: A newmethod applied to Force Concept Inventory data,”Phys. Rev. Phys. Educ. Res. , 020131 (2016).[15] M.J. Newman, Networks, 2nd ed. (Oxford UniversityPress, New York, NY, 2018).[16] K.A. Zweig,

Network Analysis Literacy: A Practical Ap- proach to the Analysis of Networks (Springer-Verlag,Wien, Austria, 2016).[17] F. De Vico, J. Richiardi, M. Chavez, and S. Achard,“Graph analysis of functional brain networks: Practicalissues in translational neuroscience,” Philos. T. R. Soc.Lon. B (2014).[18] J. Lop´ez Pe˜na and H. Touchette, “A network theoryanalysis of football strategies,” in Sports Physics: Proc.2012 Euromech Physics of Sports Conference , editedby C. Clanet (Editions de l’Ecole Polytechnique, Paris,France, 2012) pp. 517–528.[19] G. Csardi and T. Nepusz, “The igraph soft-ware package for complex network research,”InterJournal, Complex Systems , 1–9 (2006).[20] R Core Team,

R: A Language and Environment for Statistical Computing ,R Foundation for Statistical Computing, Vienna, Austria(2017).[21] J. Wells, R. Henderson, J. Stewart, G. Stew-art, J. Yang, and A. Traxler, “Exploring thestructure of misconceptions in the Force Con-cept Inventory with modiﬁed module analysis,”Phys. Rev. Phys. Educ. Res. , 020122 (2019).[22] S. DeVore, J. Stewart, and G. Stewart, “Examining theeﬀects of testwiseness in conceptual physics evaluations,”Phys. Rev. Phys. Educ. Res. , 020138 (2016).[23] “Physport,” . Accessed8/8/2017.[24] “Table II for the Force Concept In-ventory (revised from 081695r),” http://modeling.asu.edu/R&E/FCI-RevisedTable-II_2010.pdf .Accessed 3/17/2019.[25] J. Yang, C. Zabriskie, and J. Stewart, “Mul-tidimensional item response theory and theForce and Motion Conceptual Evaluation,”Phys. Rev. Phys. Educ. Res. , 020141 (2019).[26] R.K. Thornton, D. Kuhl, K. Cummings, and J. Marx,“Comparing the Force and Motion Conceptual Evalua-tion and the Force Concept Inventory,” Phys. Rev. Phys.Educ. Res. , 010105 (2009).[27] T.I. Smith and M.C. Wittmann, “Applying a resourcesframework to analysis of the Force and Motion Concep-tual Evaluation,” Phys. Rev. Phys. Educ. Res. , 020101(2008).[28] T.I. Smith, M.C. Wittmann, and T. Carter, “Applyingmodel analysis to a resource-based analysis of the Forceand Motion Conceptual Evaluation,” Phys. Rev. Phys.Educ. Res. , 020102 (2014).[29] D. Huﬀman and P. Heller, “What does theForce Concept Inventory actually measure?”Phys. Teach. , 138 (1995).[30] S. Ramlo, “Validity and reliability of the Force and Mo-tion Conceptual Evaluation,” Am. J. Phys. , 882–886(2008).[31] T.I. Smith, K.A. Gray, K.J. Louis, B.J. Ricci, andN.J. Wright, “Showing the dynamics of student think-ing as measured by the FMCE,” in , edited by L. Ding,A. Traxler, and Y. Cao (2018) p. 380.[32] K.J. Louis, B.J. Ricci, and T.I. Smith, “Determining ahierarchy of correctness through student transitions onthe FMCE,” in , edited by A. Traxler, Y. Cao, andS. Wolf (2019).[33] A. Madsen, S.B. McKagan, and E. Sayre, “Gender gap on concept inventories in physics: What is consistent,what is inconsistent, and what factors inﬂuence the gap?”Phys. Rev. Phys. Educ. Res. , 020121 (2013).[34] L.E. Kost, S.J. Pollock, and N.D. Finkelstein, “Char-acterizing the gender gap in introductory physics,”Phys. Rev. Phys. Educ. Res. , 010101 (2009).[35] S. Salehi, E. Burkholder, G.P. Lepage, S. Pollock, andC. Wieman, “Demographic gaps or preparation gaps?:The large impact of incoming preparation on perfor-mance of students in introductory physics,” Phys. Rev.Phys. Educ. Res. , 020114 (2019).[36] M. Lorenzo, C.H. Crouch, and E. Mazur, “Reducing thegender gap in the physics classroom,” Am. J. Phys. ,118–122 (2006).[37] P.B. Kohl and H.V. Kuo, “Introductory physicsgender gaps: Pre-and post-studio transition,” in ,Vol. 1179, edited by M. Sabella, C. Singh, and C. Hen-derson (AIP Publishing, New York, 2009) pp. 173–176.[38] S.J. Pollock, N.D. Finkelstein, and L.E. Kost,“Reducing the gender gap in the physics class-room: How suﬃcient is interactive engagement?”Phys. Rev. Phys. Educ. Res. , 010107 (2007).[39] R. Henderson, P. Miller, J. Stewart, A. Traxler, andR. Lindell, “Item-level gender fairness in the Force andMotion Conceptual Evaluation and the Conceptual Sur-vey of Electricity and Magnetism,” Phys. Rev. Phys.Educ. Res. , 020103 (2018).[40] A. Traxler, R. Henderson, J. Stewart, G. Stewart, A. Pa-pak, and R. Lindell, “Gender fairness within the ForceConcept Inventory,” Phys. Rev. Phys. Educ. Res. ,010103 (2018).[41] L. Viennot, “Spontaneous reasoning in elementary dy-namics,” Eur. J. Sci. Educ. , 205–221 (1979).[42] D.E. Trowbridge and L.C. McDermott, “Investigation ofstudent understanding of the concept of acceleration inone dimension,” Am. J. Phys. , 242–253 (1981).[43] A. Caramazza, M. McCloskey, and B. Green, “Naivebeliefs in “sophisticated” subjects: Misconceptions abouttrajectories of objects,” Cognit. , 117–123 (1981).[44] P.C. Peters, “Even honors students have conceptual dif-ﬁculties with physics,” Am. J. Phys. , 501–508 (1982).[45] M. McCloskey, “Intuitive physics,” Sci. Am. , 122–131 (1983).[46] R.F. Gunstone, “Student understanding in mechanics:A large population survey,” Am. J. Phys. , 691–696(1987).[47] C.W. Camp and J.J. Clement, Preconceptions in mechan-ics: Lessons dealing with students’ conceptual diﬃculties (Kendall/Hunt, Dubuque, IA, 1994).[48] L.C. McDermott, “Students’ conceptions and problemsolving in mechanics,” in

Connecting Research in PhysicsEducation with Teacher Education , edited by Andr´eeTiberghien, E. Leonard Jossem, and Jorge Barojas (In-ternational Commission on Physics Education, 1997) pp.42–47.[49] R. Rosenblatt and A.F. Heckler, “Systematic study ofstudent understanding of the relationships between thedirections of force, velocity, and acceleration in one di-mension,” Phys. Rev. Phys. Educ. Res. , 020112 (2011).[50] N. Erceg and I. Aviani, “Students’ understanding ofvelocity-time graphs and the sources of conceptual dif-ﬁculties,” Croat. J. Educ. , 43–80 (2014).[51] B. Waldrip, “Impact of a representational approach on students’ reasoning and conceptual understanding inlearning mechanics,” Int. J. Sci. Math. Educ. , 741–765 (2014).[52] A.A. diSessa, “Toward an epistemology of physics,”Cogn. Instr. , 105–225 (1993).[53] A.A. diSessa and B.L. Sherin, “What changes in concep-tual change?” Int. J. Sci. Educ. , 1155–1191 (1998).[54] M.T.H. Chi and J.D. Slotta, “The ontological coherenceof intuitive physics,” Cogn. Instr. , 249–260 (1993).[55] M.T.H. Chi, J.D Slotta, and N. De Leeuw, “From thingsto processes: A theory of conceptual change for learningscience concepts,” Learn. Instr. , 27–43 (1994).[56] J.D. Slotta, M.T.H. Chi, and E. Joram, “Assessing stu-dents’ misclassiﬁcations of physics concepts: An ontologi-cal basis for conceptual change,” Cogn. Instr. , 373–400(1995).[57] D. Hammer, “Misconceptions or p-prims: How may al-ternative perspectives of cognitive structure inﬂuence in-structional perceptions and intentions,” J. Learn. Sci. ,97–127 (1996).[58] D. Hammer, “More than misconceptions: Multi-ple perspectives on student knowledge and reason-ing, and an appropriate role for education research,”Am. J. Phys. , 1316–1325 (1996).[59] D. Hammer, “Student resources for learning introductoryphysics,” Am. J. Phys. , S52–S59 (2000).[60] J. Minstrell, “Facets of students’ knowledge and relevantinstruction,” in Research in Physics Learning: Theoret-ical Issues and Empirical Studies , edited by R. Duit,F. Goldberg, and H. Niedderer (IPN, Kiel, Germany,1992) pp. 110–128.[61] R.E. Scherr, “Modeling student thinking: An example from special relativity,” Am. J. Phys. , 272–280 (2007).[62] J. Clement, “Students’ preconceptions in introductorymechanics,” Am. J. Phys. , 66–71 (1982).[63] J. Clement, D.E. Brown, and A. Zietsman, “Not allpreconceptions are misconceptions: Finding anchoringconceptions for grounding instruction on students intu-itions,” Int. J. Sci. Educ. , 554–565 (1989).[64] J. Clement, “Using bridging analogies and anchor-ing intuitions to deal with students’ preconceptions inphysics,” J. Res. Sci. Teach. , 1241–1257 (1993).[65] “US News & World Report: Education,” https://premium.usnews.com/best-colleges . Ac-cessed 4/30/2017.[66] M.E.J. Newman and M. Girvan, “Finding andevaluating community structure in networks,”Phys. Rev. E , 026113 (2004).[67] See Supplemental Material at [URL will be inserted bypublisher] for the communities detected at the 5% and10% node retention threshold.[68] Andr´as Vargha and Harold D. Delaney, “A cri-tique and improvement of the “CL” common lan-guage eﬀect size statistics of McGraw and Wong,”J. Educ. Behav. Stat. , 101–132 (2000).[69] H.B. Mann and D.R. Whitney, “On a test of whether oneof two random variables is stochastically larger than theother,” Ann. Math. Statist. , 50–60 (1947).[70] I. Halloun, R.R. Hake, E.P. Mosca, and D. Hestenes,“Force Concept Inventory (revised 1995),” (1995), http://modeling.asu.edu/R&E/Research.htmlhttp://modeling.asu.edu/R&E/Research.html