Group roles in unstructured labs show inequitable gender divide
Katherine N. Quinn, Michelle M. Kelley, Kathryn L. McGill, Emily M. Smith, Zachary Whipps, N.G. Holmes
GGroup roles in unstructured labs show inequitable gender divide
Katherine N. Quinn,
1, 2
Michelle M. Kelley, Kathryn L.McGill, Emily M. Smith, Zachary Whipps, and N.G. Holmes Center for the Physics of Biological Function, Princeton University, Princeton NJ 08540, United States Initiative for the Theoretical Sciences, CUNY Graduate Center, New York NY 10016, United States Laboratory of Atomic and Solid State Physics, Department of Physics,Cornell University, Ithaca, NY 14853, United States. Department of Physics, University of Florida, Gainesville, FL 32611, United States. Department of Physics, Colorado School of Mines, Golden, CO 80401, United States. Department of Physics, Cornell University, Ithaca, NY 14853, United States. (Dated: May 18, 2020)Instructional labs are being transformed to better reflect authentic scientific practice, often byremoving aspects of pedagogical structure to support student agency and decision-making. Weexplored how these changes impact men’s and women’s participation in group work associatedwith labs through clustering methods on the quantified behavior of students. We compared thegroup roles students take on in two different types of instructional settings; (1) highly structuredtraditional labs, and (2) less structured inquiry-based labs. Students working in groups in theinquiry-based (less structured) labs assumed different roles within their groups, however men andwomen systematically took on different roles and men behaved differently when in single- versusmixed-gender groups. We found no such systematic differences in role division among male andfemale students in the traditional (highly-structured) labs. Students in the inquiry-based labs werenot overtly assigned these roles, indicating that the inequitable division of roles was not a result ofexplicit assignment. Our results highlight the importance of structuring equitable group dynamics ineducational settings, as a gendered division of roles can emerge without active intervention. As theculture in physics evolves to remove systematic gender biases in the field, instructors in educationalsettings must not only remove explicitly biased aspects of curricula but also take active steps toensure that potentially discriminatory aspects are not inadvertently reinforced.
The demographic composition of physicists is not rep-resentative of the general population, with men over-represented not only in number but also in high-rankingpositions within the physics community [1]. In explor-ing the underlying mechanisms for this, there has been alarge focus in education research on gaps in performancebetween men and women on concept inventories andcourse grades [2, 3]. While informative, this approachprovides an incomplete picture [3, 4]; importantly, stu-dent persistence in physics can often be independent oftheir physics test scores [5]. New strides in science educa-tion research now include investigating more metrics suchas sociocultural factors [6, 7], self-efficacy [8, 9], sense ofbelonging [10], and identity formation [9, 11, 12]. More-over, participation in the physics community through the roles people take on within the community can heavilyshape one’s identity as a physicist [13]. Within any com-munity, members assume different roles as they take ondifferent responsibilities, perform certain functions, andare perceived in specific ways by themselves and by thegroup [14]. Understanding what roles develop through-out students’ physics education is critical, as the field ofphysics is associated with masculinity, suggesting that agendered division of roles may greatly influence the mod-ern practice of physics. [15, 16].Students have little direct experience with the field,however [17], and their perceptions of the field and theirphysics identities are developed through their immersionin physics courses. Many courses (including labs) involvesignificant group work, which can leverage the fact that strong peer relationships can benefit students’ develop-ment of their science identities [18–21]. As with groupwork in other aspects of physics courses (such as cooper-ative problem solving, tutorial, or in-class lecture activ-ities), lab activities require coordination of group mem-bers as they collect and interpret a common data set. Labactivities are distinct from other learning environmentsin that there are multiple distinct activities that mustbe carried out, so division of labor, and thus assigningdistinct roles, is much more common.In this study, we explored patterns in the behaviorsstudents exhibit in the context of physics labs. In do-ing so, we aim to better understand the group roles thatemerge in these spaces. Labs provide an environmentwhere students interact with peers and engage in physicsexperiments in ways that can influence their perceptionof physics and of themselves as physicists [22]. Further-more, labs are changing nationally in response to callsto provide students with more authentic science experi-ences [23]. Understanding how students behave and in-teract with each other in different lab environments, andthe roles students assume in these settings, can informeducators and researchers when designing new pedagogyto better address inequities.Identity formation is a complicated, multi-dimensionalprocess that includes gender, race, physical ability, so-cioeconomic status, sexual orientation, and religion,among many, many other factors. The formative processincludes individual agency as well as broader cultural andsocietal factors [24]: the impact of the broader culture a r X i v : . [ phy s i c s . e d - ph ] M a y outside of the physics classroom strongly influences one’sidentity formation (such as a culturally-perceived notionof physics as a masculine field [25]). Importantly, howone develops a sense of identity impacts the set of avail-able roles one may take on in a particular context, andstrongly determines persistence in a particular field [26].In this study, we analyzed the quantified behavior pro-files (discussed further in Sec. I) as a way of probing theroles students take on in physics labs to understand someof the ways in which these roles can be equitably or in-equitably divided.We define an equitable division of roles as one in whichall members are equally likely to assume each role, i.e. ,every role is available to every member. Note that thisis different from equal or identical roles, in which everystudent performs the same function and thus would be-have similarly from each other. An inequitable divisionof roles is one in which not every role is available to everymember. For example, certain members are expected toassume, or prevented from taking on, certain roles. Atthe individual level, roles that are divided among groupmembers may not be indicative of inequitable division.However, if students systematically behave differently ingroups, then broader statistical analyses will reveal theseoverarching inequities. For instance, roles may be gen-dered , in the sense that there is an inequitable genderdivide, with men and women taking on systematicallydifferent roles [27, 28].Prior research has found that group work often involvesinequitable participation between men and women. Forexample, female students participated less in groupdiscussion when they were outnumbered by male stu-dents [29, 30] and responded disproportionately lessthan male students to instructor-posed questions in lec-ture [31]. In physics lab courses, students have describedthe available roles themselves as being either masculineor feminine [22] and women have been found to engageless frequently with hands-on equipment [32, 33] or withcomputers [34] when working in mixed-gender pairs. Incontrast, contradicting results were found when compar-ing the performance of male-majority, female-majority,and mixed groups on engineering design tasks across twodifferent courses [35].Individual students’ behaviors can be used to probethe roles they take on in physics labs, and are likely a re-sult of their personal identity (gender or otherwise), theparticular instructional context, and the broader physicsculture [26, 27, 34, 36–38]. Understanding students’ ex-periences in labs through the behaviors and roles theytake on both highlights existing gender disparities as wellas informs future research on students’ persistence in sci-ence. We specifically sought to understand the impact of different instructional lab environments on student rolesand how these roles are divided between men and women.One way to make labs more authentic is to make themdiscovery-based and inquiry-driven, removing structurefrom the lab. How does removing pedagogical structurein the lab impact these learning environments? Specif- ically, what impact does pedagogical structure have onthe equitable division of roles within groups? I. MATERIALS AND METHODS
All participants in this study were undergraduate stu-dents at a major research university enrolled in thehonors-level mechanics course of a calculus-based physicssequence. The course was designed for physics majorsand open to students across the sciences and engineer-ing. The sample of prospective physics majors is an im-portant population for this study, given the potential linkbetween students’ experiences, roles, identity, and persis-tence in physics [26]. We explored students’ behaviors intwo different types of lab instruction.The highly-structured traditional labs were designedto reinforce physics content knowledge presented in lec-ture. Students were provided with detailed paper work-sheets to follow during lab, guiding them through ex-periments that provided them with hands-on experience.The lab guides provided explicit details about what andhow much data to collect and posed targeted conceptualphysics questions to support making predictions and in-terpreting results. Students worked in groups to collectdata for the experiments and submitted individual paperworksheets.In contrast, the less structured inquiry labs were de-signed to emphasize the process of experimentation inphysics (see, for example, Ref. [39–42]). Students wereprovided with a specific goal, but were expected to designtheir own experiment to achieve that goal. Lab guidesprompted students to design data collection methodsto reflect on results, and to design follow-up investiga-tions to improve or extend their investigations. Studentsworked collaboratively to design and implement their ex-periments and submitted only one electronic notebook asa group. Reference [40] includes additional detail aboutthe differences between the conditions, including differ-ences between students’ learning engagement with exper-imentation, and attitudes towards experimental physics.The same mechanics course was taught twice dur-ing the academic year, once in the fall semester andthen again in the spring semester. Students from bothsemesters were included in this study. During the firstsemester, all students attended the same lecture, mixedtogether in discussion sections, but were separated intotwo pedagogically different lab types discussed below(three traditional lab sections and two inquiry lab sec-tions). During the second semester, the two lab sectionsunder study were both inquiry labs. Note that we ob-served students across multiple lab periods throughoutthe course of the semester (and each student appeared inonly one semester), and so while each student is in one labsection they appear in multiple lab periods. All partici-pants were unaware of the differences between lab types:students in the first semester self-selected into their labsections prior to the start of the course by registering forthe course, and only the inquiry lab sections were avail-able to students in the second semester. Student groupsvaried every period, and were randomly assigned.The role a student takes on in their group is a highlycomplex reflection of the function they serve in the group,and depends on numerous factors from the individual tothe cultural level. Because this study explores studentroles in physics labs, we assume that these roles are insome way correlated with their behavior in these labs,such as handling of equipment or of computer usage. Toprobe the roles that students assumed in physics labs, weanalyzed the quantified behavior profiles of 143 studentsacross multiple lab periods. We collected data for thisstudy at two levels of granularity.First, coarse behaviors were captured at five minuteintervals for all students in multiple lab periods. Thecodes were determined by what the students were han-dling: (1) lab desktop computer, (2) personal laptop orother device, (3) writing on paper, (4) handling equip-ment, or (5) engaging in some other activity. We usedthe Other code to capture all other behaviors such as:discussing within their group, with another group, orwith the instructor; engaging in whole-class discussions;writing on whiteboards; or engaging in off-task behav-iors. Note that the Other code was constructed to en-sure all time was coded for every student, and thereforecaptures many different behaviors. The choice of codeswere designed to capture enough detailed information aspossible about every student, coded in real time, whilereflecting the lack of a priori knowledge of what the ex-act group roles were. The behaviors of each student ineach lab period were amassed to create a profile of theirbehaviors during that lab period. Unfortunately, giventhe observation protocol (discussed in detail in Sec. I B)where each student was observed over the course of anentire lab period, subdividing the Other code could notbe done quickly enough and with enough accuracy by theresearchers. Instead, a second analysis of such detailedbehavior was performed using video from single groups,and discussed in greater detail in Sec. I D.
A. Collecting Demographic Information
We used in-class surveys to obtain student demo-graphic information. In all, 143 students across multiplelab sections were used in this study. While they had theoption to disclose a gender other than woman or man ,no student chose to do so, and only two students did notdisclose their gender identity. As a result, all studentswere included in the initial cluster analysis, however thegender analysis follows the traditional gender binary of woman or man (with the two undisclosed students omit-ted from the graphs in Fig. 4 and Fig. 6 due to insufficientstatistics). Table I shows the demographic breakdown ofstudent participants in this study. To obtain the stan-dard error on the fraction of a population (such as in Table I or Fig. 6), we used the following: Err ( p, N ) = (cid:114) p (1 − p ) N (1)where p is the fraction of the population, and N is thesize of the total population. TABLE I.
Student demographics of this study. Errorswere computed using standard error for population fractions,shown in Eq. 1. In all, 143 students were considered in thisstudy.
Traditional Labs Inquiry Labs N % N %Women 11 19 ± ± ± ± ± ± B. Quantifying Coarse Student behaviors
In all lab sections, observers documented studentbehaviors following the observation protocol used inRef. [34]. Every five minutes, an observer noted each stu-dent’s actions in the lab using one of five codes: Desktop,Equipment, Laptop, Paper, and Other. One code wasapplied to each student in the class at each five-minuteinterval, except in cases where students could not be ob-served (e.g. because they were late or left early). Thecodes are described in Table II, and were based on whata student could be handling in the lab. The Other codecaptured all other behaviors such as engaging in whole-class discussions, writing on whiteboards, discussing withthe TA or UTA, and off-task behaviors, ensuring that allin-lab time was coded. The Desktop code was separatedfrom the Laptop code because the Desktop was often re-quired for data collection (e.g. because it was directlyconnected to a detector or piece of equipment). Further-more, desktops were shared within groups whereas theLaptop code was ascribed to students handling personaldevices. While desktops were present in both lab types,only students in the inquiry labs actively used laptops toanalyze data, document their lab procedures and submittheir electronic notebooks.
TABLE II.
Action codes used in observations . The Lap-top code is used for both handling a laptop or personal device(students used laptops, phones, and tablets for the purposeof notetaking, writeup, data analysis and reading instructionsin the inquiry labs).
Code Description
Desktop Using the desktop computer at the lab bench.Equipment Handling equipment.Laptop Using a laptop or personal device.Paper Writing on paper or in a notebook.Other Other action or behavior.
The codes used in this study, in particular the Othercode, are very coarse and so multiple behaviors can fallunder the same code ( e.g. the Laptop code includes usinga laptop for data anlysis as well as for note taking, theOther code captures activities such as discussing the labwith group members or engaging in off-task talking withgroup members). Given the observation protocol, it wasnot possible for an observer to differentiate between thesedifferent and more nuanced behaviors in real time forevery student, and so a second analysis was performed,and the details of which are outlined in the Sec. I D.To validate our observation procedure, two observerscoded student actions in the same lab period using thedescribed protocol but at different five-minute intervals.If we had had each observer code the same student atthe same time, we would have only evaluated the reliabil-ity of the codes. Instead, observers were specifically notcoding the same student at the same time. Thus, com-paring the overall code count for each student provides ameasure of reliability of the codes recorded at five-minuteintervals . By comparing the overall code count for eachstudent, we provide a measure of reliability about theoverall method. This method limits us, however, fromcomparing individual student behavior over time in thelab period. Thus, all analysis is performed on the studentprofiles, which aggregate their behaviors throughout thelab period. Note that because observers were explicitlynot observing the same student at the same time, percentagreement or calculating Cohen’s Kappa would not pro-vide the necessary information to validate the method.Instead, a standard chi-squared analysis was performedon the contingency table constructed from the accumu-lated codes (the frequency each observer noted each code,summed over all students). We used the criteria that iftwo sets of observations are statistically indistinguishablefrom each other, then the observers captured the sameoverall profiles for the students in the lab session. Notethat, if either (1) there was not agreement between thecodes, or (2) the five-minute interval did not accuratelycapture student behavior when averaged over a lab pe-riod, then there would be disagreement in these overalldistributions.In all cases observers’ distributions were statisticallyindistinguishable, and so single observers coded subse-quent lab periods. When attempts were made at subdi-viding the codes, for instance to capture students per-forming data analysis vs. notetaking or identifying ifgroup discussions were off task, we were not able to ob-tain agreement between observers. As such, we used theprotocol detailed in this section. We provide an exampleof observer comparisons for illustrative purposes. A sam-ple graph of the accumulated codes for two observers in atraditional lab section is presented in Fig. 1. The contin-gency table constructed from these observations is givenby Table III. Because the two distributions are statisti-cally indistinguishable, the observers captured the samedistribution of student actions.Because students were observed during multiple lab pe-
FIG. 1.
Bar plot of code counts from two observers used to form the basis of a chi-squared test to validate theobservation protocol used in this study. Two observers doc-umented the same lab period, and the resulting contingencytable (given by the raw counts displayed on the graph andshown in Table III) was used to determine statistical validityof the method. Here, the two distributions are statisticallyindistinguishable indicating that the observers captured thesame distribution of student actions.TABLE III.
Sample contingency table used to determineif two distributions are statistically different. Two observersdocumented the same lab period, and a chi-squared test wasperformed to determine if the resulting distributions are sta-tistically similar or dissimilar. Here, we obtain p > .
1, in-dicating that the observers captured the same distribution ofstudent actions.
Observer Desktop Equipment Laptop Paper Other riods over a full semester, we were able to document indi-vidual students more than once. As a result, we obtained522 unique student profiles , each quantifying the actionsof one student in one lab period through the frequency ofassociated codes. Table IV shows a demographic break-down of the student profiles used in this study.
TABLE IV.
Demographic breakdown of student pro-files measured in this study. Errors were computed usingstandard error for population fractions, shown in Eq. 1. Inall, 143 students were observed across multiple lab periods,resulting in 522 unique student profiles.
Traditional Labs Inquiry Labs N % N %Women 34 18 ± ± ± ± ± . ± . FIG. 2.
Box plots of raw data revealing the highly non-Gaussian nature of the code distributions. Each faded pointis the accumulated codes for a student in a lab period for aparticular category (the horizontal spread of the points is justto visualize all the points), and so darker regions representmore total codes of that value (with the darkest regions nearzero). Note that the median for all codes except Other isless than or equal to one, reflecting the fact that over halfof students were observed engaging in that behavior once orless than once. This, combined with the fact that there are alarge number of outliers, is an indication that students eitherengage in a particular activity a lot or not at all.
C. Cluster Analysis
The distribution of coarse behavior code frequenciesare highly skewed, with most students engaging in a par-ticular activity infrequently or not at all and some stu-dents engaging in an activity a lot. Figure 2 shows boxplots of the raw data, illustrating the non-Gaussian fea-tures of the data. For this reason, we performed a clus-ter analysis instead of methods that rely on the assump-tion of Gaussian distributions. Clustering can accountfor non-linearities missed in common regression analy-ses, capturing dominant behavior as opposed to average behavior, and has been used in similar studies of this typeto provide fruitful results [43]. By performing a demo-graphic analysis on the student groupings (i.e. clusters)we can quantitatively characterize coarse gendered be-havior.To perform a cluster analysis on multidimensionaldata, the scales for each measure must be the same. Inthis study, there were two major effects present whichcaused differences in scales that we accounted for. First,the amount of coded time for each student was highlyvariable, ranging from less than 45 minutes to over 175minutes. To account for this effect, we normalized eachstudent profile. In this way, each measure represents thefraction of time spend on a particular task. Second, thereis the inherent differences in the five measures. For in-stance, from Fig. 2, we can see that the distributions forOther is more spread out than for Equipment. To ac-count for this, each measure was grand mean scaled sothat, averaged over all students, each measure had mean 0 and standard deviation of 1. In doing so, each measurebecomes a Z-score [43, 44]. Thus, each student’s Z-scoretells us whether the time they spent on a particular ac-tivity was above or below average as compared to otherstudents. Moreover, the Euclidean distance between twoprofiles has a statistical interpretation in this Z-score for-mat: it measures the dissimilarity of two student profilesin units of standard deviations [43].We performed a standard k-means clustering on therescaled student profiles. K-means is an iterative algo-rithm, where the researcher specifies the number of clus-ters. The algorithm clusters and then re-clusters the datain an iterative manner until the sum of the square of thedistances from all points to their respective cluster’s cen-ter is minimized and no point changes cluster betweeniterations [45].Note that not all data can be meaningfully clustered.For example, even if all data form a structure-less blob,a researcher can still input two or more clusters and thealgorithm will converge to a solution. Therefore, in or-der to determine (1) if the data are clusterable, and (2) ifso, what the optimal number of clusters is, we used theelbow method [46]. We plotted the average squared dis-tance from each point to the center of its assigned clus-ter, as a function of the number of clusters, and comparedthe results to 10,000 randomly generated student profiles.We used enough random data to numerically generate asmooth function and ensure that the comparison is nothindered by statistical fluctuations. The results of the el-bow plot are shown in Fig. 3. The plot for our collecteddata was substantially below random, indicating that thedata is clusterable. There is a distinct kink in the plotfor five clusters, indicating that the optimal number ofclusters is five.From the elbow plot in Fig. 3, specifically from lookingat the drop in average squared distance from each pointto the center of its cluster for five clusters compared toone, we can see that the five optimal clusters accountfor 70% of the variance in the data. By looking at thedistances confined to each of the five measures (i.e. gen-erating similar figures as that of Fig. 3 for each measure,where the max value would be one instead of five), wefound that the five optimal clusters account for 73% ofDesktop use, 60% of Equipment use, 78% of Laptop use,and 59% of Other activities). This is well above the 50%threshold used for a study of this type [43, 44].We provide a 2D visualization of the clusters using t-SNE [47], with each dot representing a profile colored byits assigned cluster (Fig. 4). Figure 4 is a two-dimensionalrepresentation of a five-dimensional space, and so is usedprimarily for qualitative illustration.Clusters from k-means are characterized by their cen-ters. Here, the centers of the five clusters matched thefive codes used in this study and so we labelled theclusters accordingly. Therefore, the clusters character-ize “high users” of a particular measure. Note that thisdescription fits with the raw data, shown in Fig. 2, whichillustrates that the majority of students engage in a par-
FIG. 3.
Elbow plot used to determine the optimal numberof clusters for the data. The average squared distance fromeach point to the center of its assigned cluster is plotted asa function of the number of clusters. There is a kink at five,indicating that the optimal number of clusters for the datais five. Our results were compared against 10,000 randomlygenerated student profiles. Note that the elbow is well belowrandom, a sign that the data can be clustered. Superimposedon the graph is a two-dimensional visualization of the dataand random points for qualitative comparison. The data showstructure (brown points in lower left), whereas the randompoints form a blob (grey points in center right). ticular task either frequently or very rarely. For exam-ple, students in the yellow cluster of Fig. 4 spent a largerfraction of their time on the equipment than the aver-age student, so this cluster is referred to as the Equip-ment cluster. This is a feature of student behaviors, andnot due to the number of codes used in this study. Forinstance, one could imagine a scenario in which all stu-dents behave nearly identically, with minor differencesdescribed by fluctuations: in that case, the data wouldform a five-dimensional Gaussian cloud centered at zero,and an elbow plot that matches random noise. Or, onecould imagine a scenario in which only students handlingequipment handle the lab desktop, in which case a clusterwould emerge that couples the two respective codes.We used the clusters that emerged from the data tocoarsely characterize the roles students take on in labs.Generally, roles within groups are complex and multidi-mensional and could be further explored in greater de-tail through more detailed video analysis (discussed inSec. I D), student interviews, or anthropological investi-gations. The analysis performed here provides a coarse-grained perspective on the division of roles within groups,and will ultimately reveal the unexpected inequities inrole divisions (discussed next).Because each student had multiple profiles, arisingfrom several lab periods over the course of a semester,we investigated whether or not it is possible to furthercollapse the profiles to determine “semester-long” behav-iors. We did this by analyzing whether or not individ-ual students’ profiles appear in multiple clusters over the course of a semester. In the traditional labs, 87 ±
4% ofstudents have profiles appearing in more than one clus-ter. Similarly, 86 ±
4% of students in the inquiry lab ap-pear in more than one cluster. Because so many studentshave profiles appearing in multiple clusters, the weeklyvariation in an individual’s profile is too great to furthercollapse (for numerous reasons, such as variability in labcontent and students changing lab partners).
D. Describing Detailed Student behavior
We used video recording of single-groups during full labperiods to better describe student behavior in more detailthan captured in the previous section. In all, ten videoswere coded, decomposing 23 profiles from 17 students(five students appeared in more than one video). BORISsoftware was used to code videos [48], specifically thefraction of time students engaged in different behaviors.The five codes in Table II were further broken downby what a student was doing (e.g. analyzing data) whileengaged in that coarse behavior (e.g. using the Desk-top) as shown in Fig. 5. The Paper code was used topredominantly describe students filling out paper work-sheets in the traditional labs, and so it was not furtherdecomposed. Students in the inquiry labs predominantlyused whiteboards for calculations, and very rarely usedpaper. Both the Desktop and Laptop codes were usedto describe students analyzing data, collecting data, orwriting lab notes, and so both of these codes were bro-ken down in this way. However, when collecting data,the desktop was often connected directly to equipmentwhereas gathering data on a laptop was purely repre-sented by students manually entering data into their elec-tronic notebook or analysis software. Students handlingequipment were primarily doing so to either collect dataor manipulate the setup in some way (setup, cleanup,calibration, playing) and so the Equipment code can befurther decomposed into these two tasks. In this way,the Desktop, Equipment, Laptop, and Paper codes wereexplicitly decomposed.To better describe student behavior while coded asOther, we introduced four new state codes. These wereused to describe significant events in lab, and are elab-orated in Table V. By overlapping the event codes withOther, we broke down the Other code and provide a morequalitative picture of classroom activities, such as en-gaging in whole-class discussions, using whiteboards tosketch out ideas and concepts, single group discussionswith the TA or UTA, or engaging in inter-group discus-sions with neighboring groups.To validate this method, two observers coded the samevideo as a means of testing the inter-rater reliability.The level of agreement was assessed with Cohen’s Kappawhere a value of 0.61–0.80 represents substantial agree-ment. Two observers coded the same video, and obtaineda Cohen’s Kappa value of 0.79, indicating substantialagreement between the two. As a result, only one re-
FIG. 4.
Two-dimensional visualization of behavior clusters and their centers. Each point represents a unique studentprofile, with profiles from the same group connected by a grey line (solid for less-structured inquiry labs, and dashed forhighly-structured traditional labs). Circles represent students in the traditional labs and stars in the inquiry labs, and blackedges indicate women’s profiles. All points in the Laptop cluster are stars, whereas all points in the Paper cluster are circles, areflection of the pedagogical differences in the labs (students in the traditional labs were filling out paper worksheets, whereasin the inquiry labs were filling out electronic notebooks). Clusters are characterized by their centers, and here the centers ofthe five clusters are given by large Z-scores for each of our codes. (b) Sample Profile from Time-Coded Video(a) Breakdown of Codes
FIG. 5.
Breakdown of codes by decomposing coarse behavior (e.g. “handling laptop”) into more fine-grained behavior(e.g. “analyzing data”). Ten videos were coded, resulting in 23 decomposed profiles from 17 different students (five studentsappeared in more than one video). (a) A breakdown of each code, showing the fraction of time students engaged in a particulartask while coded as a particular behavior. Three of the five codes (Desktop, Equipment, and Laptop) were directly decomposedinto sub-codes while analyzing videos, as shown in (b) illustrating a sample coded time-series. Four additional “group states”were coded in the videos, representing large group behavior (discussing with a TA or UTA, conversing with other groups, wholeclass discussions and announcements, and using a whiteboard). We decomposed the Other code by overlapping it with theselarger group states. The Paper code was purely represented by students filling out paper worksheets in the traditional labs.
TABLE V.
Event codes used in video observations .These codes described significant events in the lab, and wereused to decompose the more coarse-grained Other code. Asample time series illustrating a coded video is shown inFig. 5(b)
Code Description
Whole ClassDiscussion The TA or UTA makes an an-nouncement to the class, or holdsa whole class discussion.Whiteboarding Students perform invention activ-ities in the lab, and use a whiteboard to sketch out ideas andconcepts.Single Group Discus-sion with the TA TA or UTA engages in a discus-sion with the group (but not aspart of a whole class discussion).Inter-GroupDiscussion Groups compare results or discussamong each other (not as part ofa whole class discussion). searcher coded the subsequent videos.Video analysis was also used to better understand taskallocation. Point-events were identified when one studentexplicitly instructed another to perform a task. We brokedown the criteria for inclusion as a point event and ex-clusion as a point event in the following way: • Criteria for Inclusion:
A student needs to be ad-dressing another, and explicitly direct them in someway, such as by saying “you should do X”. • Criteria for Exclusion:
Suggesting a task should bedone that a student assumes without being asked isnot included. Examples of such events are charac-terized by statements such as “We should do X.”, “Ithink we should focus on X.”, “Does someone wantto work on X?”. Additionally, a student asking an-other for help performing a task is excluded (suchas asking another student how to sum a row in aspreadsheet, and the student telling them how).In total, we found eight point events for inclusion fromall ten videos. All such events were quick, directed com-ments related to a task the student was already engagingin. Therefore, as described in the main text, we concludethat no tasks were explicitly assigned by another student.
II. RESULTSA. Identifying course-wide behavior patternsthrough cluster analysis
We analyzed the demographic composition of each be-havior cluster by lab type (highly-structured traditionalor less-structured inquiry-based), gender (students’ self-reported gender identity of man or woman ), and groupcomposition (mixed-gender or single-gender groups). In all cases, when comparing the composition of behaviorclusters, we used a chi-squared test of frequencies on thecontingency tables of the raw counts.When broken down by lab type (shown in Fig. 6(a)),60% of the student profiles in the traditional labs werein the Paper cluster, indicating that the majority of stu-dents in the traditional labs were high paper users. Stu-dents in the inquiry labs engaged in a more varied set ofactivities, demonstrated by the uniform distribution ofstudent profiles across clusters. In the traditional labs,however, student profiles were predominantly found inthe Paper cluster, with few profiles in the remaining clus-ters.Our data support the notion that labs with reducedstructure provide a wider range of available roles. Wetested this explanation by examining the range of roleswithin individual groups in each class type: Do memberswithin a group predominantly fall into the same or dif-ferent clusters? In the traditional labs, 43% of groupshad all members in the same cluster (predominantly thepaper cluster) whereas only 14% of groups in the inquirylabs had all members in the same cluster (Fig. 7).We note that groups in the traditional and inquiry labswere of varying sizes. Groups in the traditional labs typ-ically had three or four students, whereas groups in theinquiry labs typically had two or three members, withgroup sizes determined by logistical constraints of the labspaces (such as the number of available lab benches giventhe size of each class) and mainly assigned randomly bythe instructor. Moreover, mixed-gender groups also hadbetween 1-3 women and 1-3 men. Observers documentedthe behavior of all students in every group, and kept trackof which student was in which group. One could expectthat, in groups with more members, there is an increasedchance of task division occurring. While groups in thetraditional labs typically had more members than thosein the inquiry labs, Fig. 7 in fact shows proportionallyfewer groups in the inquiry labs with members in identi-cal clusters, supporting the conclusion that groups in theinquiry labs were more likely to divide tasks.We infer that the set of available roles is much greaterin the inquiry labs and that students assumed distinctroles from one another. The traditional labs were highlyguided, leaving students little room for active decision-making about the experiment. While they worked ingroups, each student was responsible for completing theirown individual worksheet. As a result, the set of avail-able roles was both confined and manifestly similar for allstudents. In contrast, the inquiry labs were designed toemphasize the process of experimentation and thus stu-dents supported in exercising agency for active decision-making about the experiment. As a result, the set ofavailable roles was larger and students could divide tasksin a variety of ways.We next sought to evaluate whether men and womenassume different roles. We decomposed the behaviorclusters by gender and lab type, as shown in Fig. 6(b).Through a chi-squared test of frequencies, we found a sta- FIG. 6.
Cluster compositions for each of the five clusters, broken down both by lab type, gender and group composition.In all plots, y-axis represents fraction of student profiles and errors are calculated using the standard error on the fractionof a population shown (see Eq. 1 for additional details). (a) Cluster distributions broken down by lab type. (b) Clustersfurther broken down by gender. We see that there are disproportionately more women in the Laptop cluster than men, anddisproportionately more men than women in the Equipment cluster. (c) Cluster distributions were further broken down in theinquiry lab by group type (men and women in mixed-gender groups and single-gender groups). Upon inspection, we see thatthe Laptop difference remained, while a difference emerged in Other. Furthermore, far more men are high-equipment userswhen in single-gender groups. Due to insufficient statistics, no comparison can be made with women in single-gender groups,and the data are presented for completeness. tistically significant difference between men and womenin the inquiry labs ( χ (3) = 10 . p = 0 . V Cramer =0 .
15) but none in the traditional labs ( χ (3) = 3 . p = 0 . V Cramer = 0 . χ (3) = 12 . p = 0 . V Cramer = 0 . χ (3) = 10 . p = 0 . V Cramer = 0 . p > .
17 in all cases). Furthermore, dueto insufficient statistics, we were unable to perform a sim-ilar analysis for groups of varying sizes.The difference in men’s behavior when in mixed- andsingle-gender groups may be indicative of the impact ofsocial context on the roles students assume. In groupswith only men, there may be different social dynam-0
FIG. 7.
Fraction of groups with members in identical clus-ters (light ring) and different clusters (dark ring) illustratingrole division in the different labs. Almost half of groups inthe traditional labs had all members in the same cluster (pri-marily Paper cluster), whereas the majority of groups in theinquiry labs had members in multiple clusters indicating anincrease in task division. ics compared to groups that include women, changingthe set of available roles (and thus observed behaviors).For instance, the increased number of high-equipmentusers in men-only groups may be the result of “playful-ness” [49] when women are not in the group, or that inmixed-gender groups members were more efficient withequipment use.
B. Quantifying the relative behaviors of studentswithin groups
The cluster analysis in the previous section indicatesthat individual students took on different roles on acourse-wide level but suggests that group compositionmay impact the group dynamics in a non-trivial way. Toinvestigate roles within individual groups, and to ensurethat different analysis methods obtain non-conflicting re-sults, we compared each student’s profile to those of theirgroup members. We quantified the relative behaviors byconstructing a deviating profile for each student to de-scribe how they differed from their group’s average pro-file (quantified as the numerical difference of the studentprofile from the group average, see Appendix A for ad-ditional details). For example, if all students in a groupbehaved the same, the profiles of every student wouldmatch their group’s average, and thus they would eachhave a deviation of zero for each code. The distribution
FIG. 8.
Intragroup variances of the relative behaviorsamong students, signifying the amount of task division withingroups. Each plot shows VAR(∆ N ) for all student profilescontained within the labeled lab and group types along withtheir Bayesian confidence intervals. (a) Comparing across labtypes, the intragroup variances are remarkably larger in theinquiry lab groups than in the traditional lab groups for allcodes besides paper, indicating a greater range of behaviorsand an increase in task division. (b) Within the inquiry labs,the intragroup variances are comparable among groups of dif-fering composition suggesting that similar degrees of task di-vision were taking place. (Female single-gender groups notincluded due to insufficient statistics). of all students’ deviations for each code has a mean ofzero, as the deviations in every group must cancel eachother out. However, the variances of these distributions(defined here as the intragroup variance ) are not con-strained and indicate the degree of task division. Anintragroup variance of zero implies that any student’s be-haviors are completely indistinct from their group, whilea large intragroup variance reveals a greater degree ofdivide-and-conquer.In the traditional labs, the intragroup variance wasvery small for all coded behaviors other than Paper(Fig. 8(a)). This result supports the analysis and inter-pretation from the cluster analysis: groups in the tradi-tional labs did not divide roles and each student behavedsimilarly to their group members. In the inquiry labs,intragroup variances were much larger for all codes apartfrom Paper, which indicate a high degree of task divisiontook place.Within the inquiry labs, we found comparable intra-group variances among all coded behaviors regardless of1the group’s composition (that is, single- versus mixed-gender groups; Fig. 8(b)). This result suggests that thegroup composition does not impact the group dynam-ics with respect to the amount of task division; that is,single- and mixed-gender groups divide roles to similardegrees.However, within mixed-gender groups, these roles aredivided along gender lines. The distributions of devia-tions for men and women in mixed-gender groups dif-fered significantly for the Laptop ( p = 0 . p = 0 . p = 0 . C. Understanding specific student tasks andidentifying role assignments
We captured video of a subset of individual groups forentire lab periods and identified the specific tasks associ-ated with the coarse behaviors discussed in the previoussections. For example, when a student was handling theequipment, were they collecting data or setting up theapparatus? We identified the specific tasks through vi-sual cues and students’ speech. We then measured thetotal amount of time each student spent on each specifictask.The cluster and intragroup analyses found significantdifferences between men and women in mixed-gendergroups with regards to laptop usage and Other activities.The individual group video analysis found that womenspent about twice as much time as men analyzing data onlaptops (14 ±
7% of the lab period for women and 6 ± ±
4% of the lab period for menand 26 ±
5% for women).We also used the single-group video analysis to iden-tify that in almost all cases, students did not discussthe roles they would assume. Notably, there were no in-stances of explicit role allocation from peers in the groupor from lab instructors. We conjecture students eitherself-assigned roles within groups, ‘fell into’ roles, or di-rected each other through positioning (subtle verbal andnon-verbal social cues [50, 51]). Exploring mechanismsfor role allocations is the focus of future study, to betterunderstand how roles become gendered. Tentatively, we conclude that the significant difference in roles is not theresult of overt, explicit allocation. Rather, we infer thatsubtle interactions at the individual level accumulate tocreate class-level patterns.
III. DISCUSSION AND CONCLUSIONS
In this study, we identified how student behaviors ina lab vary by lab type, gender, and group composition.From coarse-grained observations of what students werehandling in the lab, we found that students in traditionallabs generally behave similarly, spending most time writ-ing on the lab worksheets. Behaviors in the inquiry labswere much more varied, with behaviors focused on usingequipment and computers. Furthermore, women in theinquiry labs tended to be high laptop users (primarily an-alyzing data), while men were high equipment users (col-lecting data or manipulating the equipment). This pat-tern varied by group composition, however, where men inmixed-gender groups were much more often engaged inOther behaviors (primarily talking to their peers), whilemen in single-gender groups were the high equipmentusers. Within-group analyses indicated that these differ-ences were a result of group members taking on distinctroles, rather than whole groups tending towards similarbehaviors. The role division was not a result of explicitallocation between group members.Research indicates that providing students with moreauthentic lab experiences, often by removing structureto grant students more agency, improves student atti-tudes towards science and engagement in high-level sci-entific practices [39, 52–55]. The results here suggestthat by removing structure in labs, these curricula fa-cilitate student-driven group work and open up a newset of group roles, but may unintentionally create in-equitable learning environments or provide the oppor-tunity for underlying inequities to manifest. Increasedstudent-agency, on its own, is insufficient for the cre-ation of a supportive and equitable learning environment,where each student has the opportunity to freely pursuetheir own path in physics. Equitable participation mustbe actively built into curricula, to eliminate implicit in-equities that can go on behind the scenes.We have found that inquiry-based labs, designed tosupport student decision-making, increased the variationin student behaviors when compared to the more tra-ditional lab structure. Working collectively in groups,with a pedagogical structure that facilitated group work(such as having one electronic notebook per group as op-posed to identical, individual worksheets) opened up newgroup roles and increased the range of behaviors studentstook on. Removing structure in lab activities so that stu-dents may take on a variety of roles supports a variety ofstudents experiences during an activity. Through theseexperiences, we may communicate to students that thereare multiple ways to contribute to science and to be aphysicist.2However, the freedom for students to fall into roleswithout any guidance or pedagogical structure has thepotential to introduce problematic inequities. While onecould argue that allowing students to assume the rolesthey are more comfortable with may increase persistencein the course (regardless of whether or not they are gen-dered), we note that in the absence of structuring equi-table participation and group work students may inad-vertently fall back on cultural norms and expectationswhen taking on roles within their group and may relyon implicit biases when making these decisions. Eachstudent’s experience is unique in a classroom, but sys-tematic differences in these experiences may have unin-tended, detrimental consequences. In this study, system-atic gendered inequities (with men and women systemat-ically taking on different group roles) and group behaviorthat depends on group composition (men behaving differ-ently when in groups with other men versus when thereis at least one woman) were statistically apparent onlyin a curriculum that provided ample agency. If such dif-ferences are supported in institutional settings, they cancontribute to increased gender segregation through stu-dents’ educational experience, and ultimately contributeto the large gender imbalance seen in the field as a whole.The focus of this study was intentionally directedat students primarily intending to major in physics.While this narrow population limits generalization tonon-physics majors, it provides vital information withregards to group work, which has potential implicationson students’ identities as physicists and decisions to per-sist in physics [26]. This work also has implications forinstruction. Our data, however, do not speak to the effi-cacy of different approaches at mitigating the issues ob-served. We can draw from previous literature to proposestrategies that should be studied. For example, it hasbeen shown that increased pedagogical structure com-bined with active learning can reduce the achievementgap in class work [56]. Therefore, actively building intolab curricula group roles (similar to those of cooperativegrouping [57]) such as “group PI”, “reviewer” or “sciencecommunicator” that have students actively think abouthow roles are assigned and make deliberate choices re-garding role division could alleviate the unintended con-sequences of subconsciously acting on implicit biases, andis the focus of further research.Previous work has identified many structural manip-ulations that support equitable participation in otherlearning environments [e.g. 57, 58]. Our results high-light that there may be unique challenges to equity ininquiry lab environments, where students divide roles as-sociated with distinct experimentation tasks (such as an-alyzing data or handling equipment). The existence ofrole division is not inherently problematic. However, thedifferent roles physics students take on can greatly in-fluence their unique experience, identity formation, andsense of belonging, which, in turn, ultimately impact per-sistence and representation in the field [26, 59, 60]. Withmany calls to reform lab instruction to provide students with more authentic experiences and less structure, re-searchers have a responsibility to evaluate the potentialside effects of such interventions. Given the many issuesin representation and persistence in STEM, students’ ex-periences should not be sacrificed for the increased learn-ing benefits of these kinds of labs. Instructors have theresponsibility of ensuring that the desired aspects of re-search and academia are being reinforced in these learn-ing environments, and that we are not inadvertently re-inforcing gendered roles by failing to actively intervene.
Appendix A: Statistical Analysis of IntragroupVariances
Here we present our intragroup analysis procedureto investigate whether roles emerged within individualgroups. Each lab period involved groups of studentsworking together as a team to progress through an ex-periment. We compared each student profile in a groupto their group’s average profile and quantified how a stu-dent deviated from their group’s average for each code.Rescaling each profile in a group with respect to thatgroup’s average reveals the variations between the group-members’ behaviors. We then compared whether therewere any significant differences between the relative be-haviors of men and women.We quantified the relative behaviors of students byconstructing each student’s deviating profile . If the codedbehaviors were distributed equally within a group, thenthe observations of each student would match the group’saverage for each code. Denoting the observed count ofa coarse behavior code for student S in group G with N S code, the expected count of that code for a student inthat group is: (cid:104) N code (cid:105) G = 1 M G M G (cid:88) S ∈ G N S code , (A1)where the sum runs over each student in one group with M G total group members. From this expectation value,we calculate how student S deviates from their group’saverage ∆ N S code = N S code − (cid:104) N code (cid:105) G . (A2)These deviations reveal interesting behavior trendswithin groups. For instance, a student engaging in aparticular task more than their group members would berevealed with a large and positive ∆ N .The distribution of deviations ∆ N for each code pro-vides information about task division within groups.When the distribution of deviations contains all groupmembers, the mean is constrained to zero since eachstudent’s deviation cancel each other out by definition.However, the variance of these distributions for each code(the intragroup variance defined as VAR(∆ N )) are notconstrained and provide a measure of the amount of task3 FIG. 9.
Histograms of intragroup deviations for menand women within the inquiry lab’s mixed-gender groups forthe Desktop, Equipment, Laptop, and Other behavior codes,with y-axis representing the number of student profiles. Eachstudent deviates from their group’s average by ∆ N (definedin Eq. A2 in Materials and Methods A). A positive ∆ N de-notes a student engaging in a behavior more often relative totheir group members. We quote p -values calculated from theMann-Whitney U test statistic on all plots and find signifi-cant differences between men and women for the Laptop andOther codes. We also find a borderline result of men handlingthe equipment more than their group members. division within a group. Zero variance among deviationswould imply the students’ behaviors are completely in-distinct from another while a large variance would reveala greater degree of divide-and-conquer.In Fig. 8(a), we plot the intragroup variances for thetwo lab types. The relative behaviors within groups fromthe inquiry labs were highly varied when compared to thetraditional labs, which exhibited remarkably less vari-ance for all codes except Paper. This result was con-firmed with Levene’s test to assess the equality of vari-ances, where none of the p -values from the test statisticfor each code exceeded 10 − . This disparity among intra-group variances was expected as the traditional labs werehighly guided and students were required to fill out theirown worksheet, while the inquiry labs were less guidedand students were given more agency for active decision- making about the experiment. The large intragroup vari-ances in the inquiry lab groups signify a higher degree oftask division taking place.To investigate task division in the inquiry lab groups,we compared the intragroup variances for different groupcompositions (Fig. 8(b)). We found comparable intra-group variances regardless of the group’s composition forall behavior codes (every code’s p -value from Levene’stest exceeding the p = 0 .
01 cutoff, with p -values rangingfrom p = 0 . − . p = 0 . p = 0 . p = 0 . ACKNOWLEDGMENTS
We thank the teaching assistants and lab instructorsfor the course used in this study for their invaluable sup-port and cooperation. We also thank Chris Gosling forvaluable conversations and insight, and James Sethnafor helpful feedback. This material is based upon worksupported by the National Science Foundation underGrant No. 1836617, the President’s Council for CornellWomen’s Affinito-Stewart Grant, and the Cornell Uni-versity College of Arts and Sciences Active Learning Ini-tiative. [1] H. Pettersson, Science Studies , 47 (2011).[2] R. Scherr, Phys. Rev. Phys.Educ. Res. , 020003 (2016),https://doi.org/10.1103/PhysRevPhysEducRes.12.020003.[3] A. Madsen, S. B. McKagan, and E. C. Sayre,Phys. Rev. ST Phys. Educ. Res. , 020121 (2013),https://doi.org/10.1103/PhysRevSTPER.9.020121.[4] S. Andersson and A. Johansson, Phys. Rev. Phys. Educ.Res. , 020112 (2016).[5] L. J. Sax, G. Holton, V. Van Horne, I. Nair, C. Davis, A. Ginorio, C. Hollenshead, B. Lazarus, and P. Ray-man, The Review of Higher Education (2001),https://doi.org/10.1353/rhe.2000.0030.[6] S. L. Eddy and S. E. Brownell, Phys. Rev. Phys. Educ.Res. , 020106 (2016).[7] K. Rosa and F. M. Mensah, Phys. Rev. Phys. Educ. Res. , 020113 (2016).[8] J. M. Nissen and J. T. Shemwell, Phys. Rev. Phys. Educ.Res. , 020105 (2016).[9] Z. Y. Kalender, E. Marshman, C. D. Schunn, T. J. Nokes- Malach, and C. Singh, Phys. Rev. Phys. Educ. Res. ,020119 (2019).[10] K. L. Lewis, J. G. Stout, S. J. Pollock, N. D. Finkelstein,and T. A. Ito, Phys. Rev. Phys. Educ. Res. , 020110(2016).[11] P. W. Irving and E. C. Sayre, Cultural Studies of ScienceEducation , 1155 (2016).[12] Z. Y. Kalender, E. Marshman, C. D. Schunn, T. J. Nokes-Malach, and C. Singh, Phys. Rev. Phys. Educ. Res. ,020148 (2019).[13] P. W. Irving and E. C. Sayre, Phys. Rev. ST Phys. Educ.Res. , 020120 (2015).[14] Etienne Wenger, Richard Arnold McDermott, andWilliam Snyder, Cultivating Communities of Practice: AGuide to Managing Knowledge (Harvard Business Press,2002) p. 284.[15] A. J. Gonsalves, A. Danielsson, and H. Pettersson, Phys.Rev. Phys. Educ. Res. , 020120 (2016).[16] B. Francis, L. Archer, J. Moote, J. DeWitt, E. MacLeod,and L. Yeomans, Sex Roles , 156 (2017).[17] S. L. Li and D. Demaree, AIP Con-ference Proceedings , 247 (2012),https://aip.scitation.org/doi/pdf/10.1063/1.3680041.[18] J. Stake and S. Nickens, Sex Roles , 1 (2005).[19] E. W. Close, J. Conn, and H. G. Close, Phys. Rev. Phys.Educ. Res. , 010109 (2016).[20] R. Lock, J. Castillo, Z. Hazari, and G. Potvin, in PhysicsEducation Research Conference 2015 , PER Conference(College Park, MD, 2015) pp. 199–202.[21] Z. Hazari, E. Brewe, R. M. Goertzen, andT. Hodapp, The Physics Teacher , 96 (2017),https://doi.org/10.1119/1.4974122.[22] A. T. Danielsson and C. Linder, Gender and Education , 129 (2009).[23] American Association of Physics Teachers, AAPT Rec-ommendations for the Undergraduate Physics Labora-tory Curriculum , Tech. Rep. (American Association ofPhysics Teachers, 2014).[24] N. W. Brickhouse, Journal of Research in Science Teach-ing , 282 (2001).[25] L. Archer, J. Moote, B. Francis, J. DeWitt, and L. Yeo-mans, American Educational Research Journal , 88(2017).[26] H. B. Carlone and A. Johnson, Journal of Re-search in Science Teaching , 1187 (2007),https://onlinelibrary.wiley.com/doi/pdf/10.1002/tea.20237.[27] J. Butler, Gender Trouble: Feminism and the Subversionof Identity (Routledge, New York, 1999).[28] D. Doucette, R. Clark, and C. Singh, European Journalof Physics , 035702 (2020).[29] P. Heller, R. Keith, and S. Anderson, Amer-ican Journal of Physics , 627 (1992),https://doi.org/10.1119/1.17117.[30] J. P. Adams, G. Brissenden, R. S. Lindell, T. F. Slater,and J. Wallace, Astronomy Education Review (2002),http://dx.doi.org/10.3847/AER2001002.[31] S. L. Eddy, S. E. Brownell, and M. P. Wenderoth, CBE-Life Sciences Education , 478 (2014), pMID: 25185231,https://doi.org/10.1187/cbe.13-10-0204.[32] J. Jovanovic and S. S. King, American Ed-ucational Research Journal , 477 (1998),https://doi.org/10.3102/00028312035003477.[33] N. G. Holmes, I. Roll, and D. A. Bonn, Physics inCanada , 1 (2014). [34] J. Day, J. B. Stang, N. G. Holmes, D. Kumar, and D. A.Bonn, Phys. Rev. Phys. Educ. Res. , 020104 (2016).[35] M. Laeser, B. M. Moskal, R. Knecht, and D. La-sich, Journal of Engineering Education , 49 (2003),https://onlinelibrary.wiley.com/doi/pdf/10.1002/j.2168-9830.2003.tb00737.x.[36] A. L. Traxler, X. C. Cid, J. Blue, and R. Barthelemy,Phys. Rev. Phys. Educ. Res. , 020114 (2016).[37] C. Gosling, Journal of Belonging, Identity, Language, andDiversity (2017).[38] J. Jovanovic and S. S. King, American Educational Re-search Journal , 477 (1998).[39] N. G. Holmes, C. E. Wieman, and D. A. Bonn, PNAS , 11199 (2015).[40] E. M. Smith, M. M. Stein, C. Walsh, and N. G. Holmes,Phys. Rev. X , 011029 (2020).[41] N. G. Holmes and E. M. Smith, The Physics Teacher ,296 (2019), https://doi.org/10.1119/1.5098916.[42] N. G. Holmes, B. Keep, and C. E. Wieman, Phys. Rev.Phys. Educ. Res. , 010109 (2020).[43] J. H. Corpus and S. V. Wormington, The Jour-nal of Experimental Education , 480 (2014),https://doi.org/10.1080/00220973.2013.876225.[44] J. A. Schmidt, J. M. Rosenberg, and P. N. Beymer,Journal of Research in Science Teaching , 19 (2017),https://onlinelibrary.wiley.com/doi/pdf/10.1002/tea.21409.[45] J. A. Hartigan and M. A. Wong, Journal of the RoyalStatistical Society. Series C (Applied Statistics) , 100(1979).[46] R. L. Thorndike, Psychometrika , 267 (1953).[47] L. v. d. Maaten and G. Hinton, Journal of machine learn-ing research , 2579 (2008).[48] O. Friard and M. Gamba, Methods inEcology and Evolution , 1325 (2016),https://besjournals.onlinelibrary.wiley.com/doi/pdf/10.1111/2041-210X.12584.[49] C. Hasse, European Journal of Psychology of Education , 149 (2008).[50] B. Davies and H. R., Journal for the The-ory of Social Behaviour , 43 (1990),https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1468-5914.1990.tb00174.x.[51] M. Berge and A. T. Danielsson, Research in Science Ed-ucation , 1177 (2013).[52] B. R. Wilcox and H. J. Lewandowski, Physical ReviewPhysics Education Research , 020132 (2016).[53] E. Etkina, A. Karelina, M. Ruibal-Villasenor, D. Rosen-grant, R. Jordan, and C. E. Hmelo-Silver, Journal of theLearning Sciences , 54 (2010).[54] S. E. Brownell, M. J. Kloser, T. Fukami, and R. Shavel-son, Journal of College Science Teaching , 36 (2012).[55] D. J. Adams, Bioscience Education , 1 (2009).[56] D. C. Haak, J. HilleRisLambers, E. Pitre,and S. Freeman, Science , 1213 (2011),https://science.sciencemag.org/content/332/6034/1213.full.pdf.[57] P. Heller and M. Hollabaugh, American Journal ofPhysics , 637 (1992).[58] K. D. Tanner, Cell Biology Education , 322 (2013).[59] K. Rainey, M. Dancy, R. Mickelson, E. Stearns, andS. Moller, International Journal of STEM Education 20185:1 , 10 (2018).[60] A. J. Fisher, R. Mendoza-Denton, C. Patt, I. Young,A. Eppig, R. L. Garrell, D. C. Rees, T. W. Nelson, andM. A. Richards, PLOS ONE14