Gendered Performance Differences in Introductory Physics: A Study from a Large Land-Grant University
Matthew Dew, Jonathan Perry, Lewis Ford, William Bassichis, Tatiana Erukhimova
GGendered Performance Differences in Introductory Physics: A Study from a LargeLand-Grant University
Matthew Dew, Jonathan Perry, Lewis Ford, William Bassichis, and Tatiana Erukhimova Department of Physics and Astronomy, Texas A&M University, College Station, TX 77845 ∗ Department of Physics, University of Texas, Austin, TX 78712 (Dated: November 17, 2020)Studies examining gender differences in introductory physics show a consensus when it comes to agender gap on conceptual assessments; however, the story is not as clear when it comes to differencesin gendered performance on exams. This study examined whether gendered differences exist onmidterm and final exams in introductory physics courses and if such differences were correlatedwith a gender difference in final course grades. The population for this study included more than10,000 students enrolled in algebra- and calculus-based introductory physics courses between spring2007 and spring 2019. We found a small but statistically significant difference, with a weak effectsize, in final letter grades for only one out of four courses: algebra-based mechanics. By looking atmidterm exam grades, statistically significant differences were noted for some exams in three out offour courses, with algebra-based electricity and magnetism being the exception. In all statisticallysignificant cases, the effect size was small or weak, indicating that performance on exams and finalletter grades was not strongly dependent on gender. As an added dimension examining gendereddifferences, we investigated if differences exist when accounting for instructor gender. Additionally,a questionnaire was administered in fall 2019 to more than 1,600 students in both introductorysequences to explore students’ perceptions of performance, class contributions, and inclusion. Weobserved some differences between students’ perception of their performance and contribution whengrouped by gender, but no difference on perception of inclusion.
I. INTRODUCTION
Over the last half century, the number of US studentsmajoring in STEM fields has more than doubled [1]. Asthis enrollment has increased, so has the attention paidto who is obtaining degrees across different disciplines,particularly when it comes to underrepresented groups.While some STEM disciplines, such as biology, have rela-tive parity between males and females attaining degrees,other disciplines have a persisting gender gap [2]. TheNational Center for Science and Engineering Statisticsfound that in 2016, women earned 20.9% of all engineer-ing bachelor’s degrees and 19.3% of all physics degrees[3].Out of all STEM disciplines, physics is often consideredto be the field that is the least welcoming for women tojoin [4, 5]. Even for students not majoring in physics,STEM majors have to take physics as part of their aca-demic program. For physics and engineering majors, in-troductory physics courses are among their early expe-riences in college. Such experiences can be crucial forstudent success within their majors [6].The importance of such experiences is especially truefor female students, as many of them leave physical sci-ence and engineering tracks during the first two yearsof college [7]. Female students are likely to be underthe pressure of gender stereotypes and societal biases[8, 9], and they often find themselves underrepresented intheir physics classes. Some authors argue that stereotype ∗ send correspondence to: [email protected] threat influences female student performance in intro-ductory physics classes [10, 11] and that use of an inter-vention based on value affirmation can help improve thesituation [12]. Perhaps related to stereotyping, the at-mosphere in physics classrooms can influence female stu-dents’ physics self-efficacy, self-identity and motivation;all of which can have an impact on student success andretention [13–17]. A number of studies have reported onthe difference in physics self-efficacy between male and fe-male students, including courses which use research sup-ported instructional methods [18–22]. As an example,Marshman et al. reported that female students had sig-nificantly lower self-efficacy than male students through-out a two-semester introductory physics course sequence[23]. They go on to note that the physics self-efficacy offemale students was negatively impacted by both tradi-tional instruction courses and flipped classroom courses.A vast literature exists that explores the gendered dif-ferences in student performance on concept inventorytests in introductory physics courses. The majority of ex-isting studies report a persistent gender gap with malesperforming significantly better than females on introduc-tory mechanics concept inventory assessments [24–29],with some authors arguing that removing gender-biasedcontext can reduce the gap [27, 28, 30]. The gender gaphas been found in conceptual inventories of electromag-netism as well, although to a lesser extent and with morevariation across studies [24, 28, 29, 31, 32].The results of prior studies on the gendered differencesin student performance based on course grades and ex-aminations are less consistent: while a number of studiesindicate that male students outperform female studentson the exams and course grades [12, 33, 34], other groups a r X i v : . [ phy s i c s . e d - ph ] N ov found no significant gendered difference in student perfor-mance [25, 26, 29, 31, 32, 35, 36]. One study, comprisedof 4,000 students across 7 semesters at the University ofColorado Boulder, reported a small but significant differ-ence in course grades, correlated with differences in back-ground factors for males and females [33]. Factors beyondthe course, including prior knowledge, math background,and attitudes towards science, have been seen to cor-relate with gendered differences in performance [33, 37].One study of an electricity and magnetism course by An-dersson and Johansson argues that the gendered differ-ence in course grades disappears when controlled for theprogram in which a student is enrolled [38]. Tai andSadler show that females outperformed males in algebra-based courses while males outperformed females with thesame background in calculus-based courses [39]. Severalstudies performed on a large number of students takingthe introductory physics classes report no significant gen-dered difference in student performance on course examsand course grades but a gender gap in concept invento-ries [25, 26, 31]. Some studies of gendered differences inundergraduate physics have reported reduction or elimi-nation through the use of carefully selected instructionalstrategies in introductory physics [21, 40, 41]. However,other groups have found no effect of applying selectedpedagogies or controlling the prior knowledge factors ongendered performance [24, 29, 42, 43].This work focused on expanding the studies of aca-demic gendered differences through large enrollmentcourses at Texas A&M University (TAMU). This is alarge, land-grant institution, which yearly serves morethan 20,000 undergraduate STEM majors across multi-ple colleges [44]. STEM majors at TAMU complete theirintroductory physics sequence through either calculus-based courses ( e.g. physics, engineering, chemistry,math) or algebra-based courses ( e.g. life science ma-jors, pre-meds, and environmental science). The engi-neering program comprises more than half of all STEMmajors at TAMU, so the calculus-based sequence enroll-ment is much larger than the algebra-based sequence.Students’ demographics, academic goals, and attitudestowards physics may differ significantly between calculus-based and algebra-based courses. Most students in thecalculus-based courses were in their freshman year, i.e.right after high school. The algebra-based courses aretypically taken by upper-level students in their sopho-more to senior year, who do not have physics or physics-related disciplines as the focus of their studies and ca-reers. Furthermore, there is a much larger proportion offemale students in algebra-based courses.Our study aimed to examine gendered differences onboth midterm and final exams as well as final let-ter grades for algebra- and calculus-based introductoryphysics courses at TAMU. The objective of this studywas to investigate whether gender differences on midtermand final exams existed and were correlated with a gen-der difference in final course grades. Prior literature re-viewed above indicates that there is no consensus whether there is a gender difference in student final exam andcourse grades in the introductory physics classes. Thequestion is still open and requires further investigation.This makes our study which includes a large data setspanning a long time period and 19 instructors partic-ularly important as it reduces variability related to in-dividual instructors and courses. We analyzed all examscores during the entire semester rather than the finalgrades only. We examined if statistically significant dif-ferences based on gender occurred on each exam for fourintroductory physics courses as the semesters were pro-gressing, which has not previously been studied, at leastnot for such a large database and over a long period oftime. To accomplish this task we created a database ofcourse grades and scores from midterms and final examsfor more than 10,000 students over a decade from two in-troductory physics course sequences: both calculus-basedand algebra-based. As an added dimension to this study,we also looked at whether differences in student perfor-mance would be observed when separating by instructorgender. Beyond the database of grades mentioned above,we also took a snapshot of students’ feelings via a shortquestionnaire to see how their perceptions aligned withhistoric performance. II. METHODS
From here forward, “significant” will be used as short-hand for “statistically significant”. Statistical signifi-cance was taken to be at p < .
05. In addition, tableswill use “Mech.” or “E&M” for mechanics or electricityand magnetism, respectively; “Alg.” or “Calc.” standfor algebra-based or calculus-based.
A. Course Data
To examine the gendered student performance withinintroductory courses, course level data were requestedfrom faculty who taught one or more of these coursessince 2007. Participating instructors provided students’first names, numerical scores for all midterm and finalexams, and the final letter grade for the course. Aftercollecting data from faculty, a database of approximately13,000 students was obtained. This database containedinformation for students enrolled in the algebra-basedsequence between 2011-2019 and the calculus-based se-quence between 2007-2017. This study was structured insuch a way that the only data collected were the course-level information provided by faculty. For this reason,connecting outcomes to non-academic factors was notpossible with this study.Since course-level data only included student names,gender was identified using an online tool, GenderizeIO[45]. This application program interface uses census datato return a probability of gender based on the input ofa first name and has been used in prior studies to at-tribute gender when these data were not available, e.g.Huang et al. 2020 [46]. Gender probability was consid-ered identifiable for this study if it was 90% or higher.This percentage was chosen as it allowed reasonable cer-tainty of gender without drastically reducing the size ofour data set. This cut eliminated about 17% of our rawsample. The number of students identified as male or fe-male from each of the four introductory courses examinedin this study is shown in Table I.
TABLE I. Number of students and their gender distributionfor each of the four introductory courses from the algebra-and calculus-based sequences.Total Number Male FemaleMech. Alg. 1,267 44.4% 55.6%E&M Alg. 999 44.6% 55.4%Mech. Calc. 5,449 74.9% 25.1%E&M Calc. 2,793 80.8% 19.2%
B. Comparing Grades
Differences based on gender were examined by look-ing at students’ final course grades and their scores onthe midterm and final semester exams. Differences inpopulations were examined using t-tests between trans-formed data, as well as analysis of variance (ANOVA)applied to raw scores. These methods were used to ex-amine the null hypothesis that there is no difference inperformance between male and female students. Com-parisons were made based on student gender, instructor,and year in the course. Some instructors gave multipleyears of data, so these criteria allowed for individual lec-ture section distributions to be examined against eachother. Though the exam distributions can be skewed forboth raw scores and z-scores (see Figure 1 for raw Exam1 scores from calculus-based mechanics), violating theassumption of normality but not the assumption of ho-mogeneity of variance according to Levene’s test, t-testsremained the most appropriate statistical analysis due tothe large sample size from each course [47]. Effect sizeswere calculated using Cohen’s d with a Hedges correction[48]. We consider d < . . < d < . . < d < . d > . x i ), subtracts the average (¯ x ), and scales by thestandard deviation ( σ ), according to the relation: z = x i − ¯ xσ . (1)A positive z-score indicates how much higher a raw score was compared to the average in units of standard devi-ation. A negative z-score indicates the same but for araw score below the average [48]. This transformation ofscores was performed for a more even comparison of examdistributions across multiple years and instructors. Forinstance, Professor A teaching a course in year X mighthave a higher average and smaller deviation than whenthe same instructor teaches the same course in year Y. Asan illustration of the z-score transformation, we can lookat raw exam scores from calculus-based mechanics. In2007, the mean score was 58 points and the standard de-viation was 21 points. Students scoring a 58 were mappedto a z-score of 0, while students scoring a 37 were mappedto a z-score of -1. This was done for all students, usingthe individual lecture section averages and standard de-viations.Raw scores were used to examine individual lecturesections. When comparing across lecture sections, rawscores were transformed into z-scores so distributionsmay be more adequately and fairly compared to one an-other. Final course grades were treated on a 4 point scale(A-F) with no plus or minus grades per TAMU’s gradingpolicy. FIG. 1. Raw score distribution of grades on the first calculus-based mechanics exam by year. For each box, the middleline represents the median, the box represents the two middlequartiles, and the error bars represent the highest and lowestquartiles.
C. Student Perceptions
In fall 2019, a short anonymous questionnaire was ad-ministered to the algebra- and calculus-based courses toexplore how students felt about their performance and in-clusion. It must be noted that this was a questionnaire,rather than a validated survey instrument. The ques-tionnaire was given during a semester that is outside thescope of the course level database described previously.We aimed to take a snapshot of students’ perceptions tosee how their responses aligned with historic data.Students were asked to self-identify their race andgender. Response choices for students identifying astransgender or non-binary were available on the ques-tionnaire. Only students identifying as male or female(99%) were analyzed for this study. The questionnairewas composed of three questions where students re-sponded using a 5-point Likert scale with responses thatwere negative (very and slightly), neutral, and positive(very and slightly):1. “I felt included by my peers and instruc-tors within this physics courses.”2. “I believe that I performed in thiscourse.”3. “I felt that my contributions to discussionsover physics material were valued during thiscourse. This includes discussions both in classand outside of class but relating to complet-ing assignments or preparing for exams.”The questionnaire was administered in the 12th and13th weeks of a 15 week semester, which occurred mid-November in fall 2019. We chose these weeks as it wasfar enough into the semester that students would haveformed an opinion of their inclusion but early enoughthat the questionnaires would not take away from finalspreparation. The questionnaire was given during recita-tion, where students were in smaller groups and did nothave their instructor in the room. The brevity of thequestionnaire was dictated by the recitation format andan attempt to maximize the response rate.
III. ANALYSIS AND RESULTS
Differential performance for male and female studentswas examined for all exams and final course grades forfour introductory physics courses. Results are sepa-rated by courses in the algebra-based sequence and thecalculus-based sequence. The calculus-based sequenceconsists of three midterm exams administered through-out the semester with a comprehensive final (identifiedas Exam 4). The algebra-based sequence consists of fourmidterm exams administered throughout the semesterwith a comprehensive final (identified as Exam 5).Questionnaire responses were converted into an ordi-nal 5-point scale, with higher numbers equating to morepositive responses. That is, better feelings of inclusion,greater performance, and stronger feelings of making val-ued contributions.
A. Calculus-Based Mechanics and E & M For calculus-based mechanics, data were provided by14 instructors, for a total of 49 lecture sections. Elevenmale instructors provided data from 34 lecture sections,
TABLE II. Average final letter grades (and standard error)by gender for the algebra- and calculus-based introductorysequences, as well as the t-test and significance between thesedistributions. Male Female t-test p Mech. Alg. 2.687 (0.043) 2.839 (0.039) -2.634 0.009E&M Alg. 2.935 (0.046) 2.875 (0.041) 0.967 0.334Mech. Calc. 2.532 (0.018) 2.517 (0.030) 0.430 0.667E&M Calc. 2.596 (0.024) 2.692 (0.047) -1.786 0.074 and three female instructors provided data from 15 lec-ture sections. As seen in Table II, there was no significantdifference observed when a t-test was applied to final let-ter grades based on student gender for the pooled datafrom all instructors and sections.Gendered performance on course exams were comparedusing t-tests on transformed data from all instructors,Table III. As a combined sample, male students scoredhigher to a significant degree on the first, third, and finalexams of the course. The effect size of these differencesis small (0 . < d < .
5) for the first exam, and weak( d < .
2) for the third and final exams.For calculus-based E&M, data were provided by 10 in-structors, for a total of 27 lecture sections. Six maleinstructors provided data from 10 lecture sections, andfour female instructors provided data from 17 lecture sec-tions. As seen in Table II, there was no significant dif-ference observed when a t-test was applied to final lettergrades based on student gender for the pooled data fromall instructors and sections.As with the calculus-based mechanics course, examswere compared using t-tests on transformed data from allinstructors, Table IV. Similar to the first semester course,male students score at least slightly higher compared tofemale students on all exams. In this case, none of thesedifferences were significant.As a validation of results found using t-tests on trans-formed data, three-way ANOVA was applied to rawscores for all exams from both calculus-based courses.Results were in agreement with t-tests applied to trans-formed data. Where significant differences were observedusing t-tests, ANOVA showed gender to be a significantfactor on its own or in combination with one or both ofthe other factors of professor and year. For the mechan-ics course, the F -statistic was significant at the p < . p < .
05 level for the third exam,and p < .
05 level for the final exam. When examiningindividual lecture sections, significant differences due togender were observed for less than 20% of lecture sec-tions. Combined with the results above, we note a per-sistent gender difference in calculus-based mechanics onexams only for pooled data, producing no significant dif-ference in final course grades. No gendered differenceswere noted in calculus-based E&M for either exams orfinal course grades.
TABLE III. Average exam z-scores (and standard error) forcalculus-based mechanics, as well as the t-test and significancebetween these distributions.Male Female t-test p Exam 1 0.060 (0.015) -0.180 (0.027) 7.780 < p Exam 1 0.013 (0.021) -0.053 (0.042) 1.360 0.174Exam 2 0.008 (0.021) -0.034 (0.043) 0.862 0.389Exam 3 0.005 (0.021) -0.020 (0.042) 0.502 0.616Exam 4 0.014 (0.021) -0.057 (0.043) 1.475 0.140
B. Algebra-Based Mechanics and E & M For algebra-based mechanics, data were provided by 4instructors, covering a total of 13 lecture sections. Threemale instructors provided data from 11 lecture sections,and one female instructor provided data from 2 lecturesections. Comparing the final letter grades based on stu-dent gender for pooled data from all instructors and sec-tions showed a significant difference with a weak effectsize ( d < . d < . d < . F -statistic was significant at the p < . p < .
01 level for the thirdexam, and p < .
001 level for the fourth exam. Forthe E&M course, the F -statistic was significant at the p < .
05 level for the first exam. When examining in-dividual lecture section level data, significant differencesdue to gender were observed for less than 15% of lec-ture sections. Combined with the results above, we notea gender difference in algebra-based mechanics on thethird and fourth exams only for pooled data, producinga small but significant difference in final course grades. Agender difference is observed in algebra-based E&M onlyfor the first exam, producing no difference in final coursegrades.
TABLE V. Average exam z-scores (and standard error) foralgebra-based mechanics, as well as the t-test and significancebetween these distributions.Male Female t-test p Exam 1 0.012 (0.042) -0.010 (0.038) 0.396 0.692Exam 2 -0.006 (0.042) 0.005 (0.038) -0.201 0.841Exam 3 -0.071 (0.041) 0.056 (0.038) -2.248 0.025Exam 4 -0.089 (0.043) 0.071 (0.037) -2.827 0.005Exam 5 -0.029 (0.042) 0.023 (0.038) -0.923 0.356TABLE VI. Average exam z-scores (and standard error) foralgebra-based E&M, as well as the t-test and significance be-tween these distributions.Male Female t-test p Exam 1 0.077 (0.047) -0.062 (0.043) 2.182 0.029Exam 2 0.053 (0.048) -0.042 (0.042) 1.492 0.136Exam 3 -0.008 (0.045) 0.007 (0.044) -0.235 0.814Exam 4 0.023 (0.049) -0.019 (0.041) 0.653 0.514Exam 5 0.022 (0.047) -0.018 (0.043) 0.624 0.533
C. The Effect of Instructor Gender
The impact of instructor gender on differences in stu-dent performance by gender was also examined. Thisanalysis could not be applied to the algebra-based coursesequence as there were data available from only one fe-male instructor for the mechanics course and no dataavailable from female instructors for the E&M course.Data were separated into two groups by instructor gen-der, and comparisons were made based on student gen-der. Differences in average z-scores for each course examare shown in Table VII, while comparisons between lettergrades are shown in Table VIII.For calculus-based mechanics, data were obtained fromeleven male instructors for 34 lecture sections (N=4,227)and from three female instructors for 15 lecture sec-tions (N=1,222). Significant differences in student per-formance based on gender were observed for instructorsof both genders on the first midterm exam, with a smalleffect size for each (0 . < d < . d < . . < d < . d < . F -statistic was significant for male instructors at the p < .
001 level for the first exam, p < .
01 level for thethird exam, and p < .
01 level for the fourth exam. Fur-ther examination of the impact of instructor gender wasperformed using Tukey HSD, which found significance inagreement with t-tests [48].
D. Students’ Perception Questionnaire
A short questionnaire consisting of three questions,that were analyzed independently, as well as demographicinformation was distributed to the students taking intro-ductory physics classes in the fall of 2019, as describedin Section II C. We received over 1,600 completed ques-tionnaires with a response rate of 63%.On the question about students’ perception of theirperformance,2. “I believe that I performed in thiscourse.” they were given five answer choices:A. Well Below AverageB. Below AverageC. AverageD. Above AverageE. Well Above AverageWe converted these answers into a 5-point scale with“Well Below Average” corresponding to a 1 and “WellAbove Average” to a 5.Using the ordinal scale described above, t-tests wereapplied to examine the null hypothesis for student per-ceptions. We found that female students rated their per-formance perception equally to male students only inalgebra-based mechanics (See Table IX). When comparedto historic data, this is the one course examined wherewe found a statistically significant difference in final let-ter grades with female students outperforming male stu-dents.For the other three courses, male students rated theirperformance as significantly higher with effect sizes rang-ing from weak to medium (0 < d < . p < . . < d < . p < .
05) with a medium effect size (0 . < d < .
8) foralgebra-based and a weak effect size for calculus-based( d < . . < d < . . < ρ < . IV. DISCUSSION
We compared student outcomes for final course gradesand exams based on gender for over 10,000 TAMU stu-dents over more than a decade. We examined thisdata to determine whether such differences were persis-tent throughout each course. The data were collectedfrom instructors who taught courses from algebra-basedor calculus-based introductory physics sequences. Priorstudies of gendered students’ performance based on examgrades show mixed results, with some authors report-ing that male students outperform their female counter-parts [12, 33, 34], whereas other authors did not finda statistically significant difference between the genders[25, 26, 29, 31, 32, 35, 36]. While this study has the ad-vantage of a large data set of the exam scores collectedover a decade and provides new knowledge on the gen-dered performance in the introductory physics classes,it has limitations: only course-level data collected fromfaculty were used in this study. Therefore, we did not an-alyze the impact of non-academic factors that have seento potentially account for 20% to 70% of the gender dif-ference [33, 37].To describe our results, we use “significant” as a short-hand for “statistically significant”, defined as p < . V. CONCLUSION
This study explored gender differences in student per-formance on exams and final letter grades for algebra-and calculus-based introductory physics sequences. Theperformance indicators for a large pool of over 10,000students spanning a period from spring 2007 to spring2019 have been analyzed. Data on midterm exams, fi-nal exams, and final course grades were collected frominstructors teaching these courses during that period oftime. Our goal was to investigate whether there was ameasurable gender difference on midterm and final examscores and if they were correlated with gender differencesin final course grades. By utilizing a large data set, differ-ences due to individual instructors or other transient fac-tors were averaged out. Where differences in final lettergrades were found for a course, there were no persistentdifferences observed across that course’s exams. Wherepersistent differences were observed on exams within acourse, there were no differences for final letter gradesin that course. In algebra-based mechanics, female stu-dents outperformed male students by a small but sta-tistically significant margin. In all statistically signifi- cant cases, the effect size was small or weak, indicatingthat performance on exams and final letter grades was, atmost, weakly dependent on gender. One should keep inmind that the instructors who taught these classes useddifferent instruction methods that were evolving duringthe period of data collected. The student demographics,background, and preparation levels were evolving duringthe years of study [44]. Since this is a long-term historicstudy of course-level data, we were not able to control forfactors such as student background in math and physics.Our paper provides new data on the gender differenceon exams and course grades within introductory physicsthrough a sizable data set collected from more than tenyears of courses at a large public university. Prior to thisstudy, we considered it an open question whether signifi-cant gendered differences could appear on particular ex-ams but not be large enough to affect final course grades.Our findings provide new information that performanceon exams and final letter grades are weakly dependenton student gender. These results may help with fightingthe gender stereotypes that negatively impact so manyfemale students [11].A questionnaire was distributed to students takingboth calculus-based and algebra-based sequences duringthe fall 2019 semester. The goal was to take a snapshotof current students’ feelings to see how their perceptionsaligned with historic performance. We collected students’feedback on their perception of their performance, feel-ings of inclusion, and the value of their contributions.The analyses of student responses revealed no differencein the feeling of inclusion in any of these courses. For onecourse, algebra-based mechanics, female students ratedtheir contributions as valued more compared to male stu-dents. For the same course, female students reportedtheir performance perception to be as high as their malecounterparts. For the other three courses, male studentsreported higher perceptions of performance than femalestudents.There are several future studies that could stem fromthis one. In the next iteration of this work, it wouldbe beneficial to connect course-level data with univer-sity records of students’ prior preparation and knowledge.While gender was not a strong factor leading to differ-ences in student performance across combined samples,a minority of sections did exhibit differences. A sub-sequent study could examine individual course sectionswhich exhibit statistically significant differences based onstudent gender for the factors which might contribute tothese differences. Additionally, an enhanced survey onstudent perceptions could be linked with course perfor-mance to allow for regression analyses between inclusionand contribution with success on exams. Since calculus-based mechanics is usually taken the earliest of these fourcourses, it would be useful to perform a study like thisone on calculus 1 and introductory chemistry. This wouldhelp us better understand if gendered performance dif-ferences among physical science and engineering majorschange over time.
ACKNOWLEDGMENTS
This study was supported by the Texas A&M Univer-sity College of Science. We would like to thank the Texas A&M University Department of Physics and Astronomyfaculty who provided us with data for this study. [1] Paula Heron and Laurie McNeil,
Phys21: PreparingPhysics Students for 21st-Century Careers (2016).[2] Anne Marie Porter and Rachel Ivie,
Women in Physicsand Astronomy, 2019 , Tech. Rep. (AIP Statistical Re-search Center, Melville, NY, 2019).[3] National Science Foundation,
Women, Minorities, andPersons with Disabilities in Science and Engineering:2019 , Special Report NSF 19-304 (National Centerfor Science and Engineering Statistics, Alexandria, VA,2019).[4] Linda J. Sax, Kathleen J. Lehman, Ram´on S.Barthelemy, and Gloria Lim, “Women in physics: Acomparison to science, technology, engineering, and matheducation over four decades,” Phys. Rev. Phys. Educ.Res. , 020108 (2016).[5] Laura McCullough, “Women in physics: A re-view,” The Physics Teacher , 86–91 (2002),https://doi.org/10.1119/1.1457312.[6] Christine Nord, Shep Roey, Robert Perkins,Marsha Lyons, Nita Lemanski, Yael Tamir, Ja-nis Brown, Jason Schuknecht, , and KathleenHerrold, “The nation’s report card: America’shigh school graduates,” (2011), retrieved fromhttps://nces.ed.gov/nationsreportcard/pdf/studies/2011462.pdf.[7] Irene Goodman, Christine Cunningham, CathyLachapelle, Meredith Thompson, Katherine Bit-tinger, Robert Brennan, and Mario Delci, “Final reportof the women’s experiences in college engineering (wece)project. eric ed507395,” (2002), 10.13140/2.1.2027.5524.[8] Lin Bian, Sarah-Jane Leslie, and AndreiCimpian, “Gender stereotypes about intellec-tual ability emerge early and influence chil-dren’s interests,” Science , 389–391 (2017),https://science.sciencemag.org/content/355/6323/389.full.pdf.[9] Nancy M. Hewitt and Elaine Seymour, “A long, discour-aging climb,” ASEE Prism , 24–28 (1992).[10] Gwen C. Marchand and Gita Taasoobshirazi, “Stereo-type threat and women’s performance in physics,” In-ternational Journal of Science Education , 3050–3061(2013), https://doi.org/10.1080/09500693.2012.683461.[11] Alexandru Maries, Nafis I. Karim, and ChandralekhaSingh, “Is agreeing with a gender stereotype correlatedwith the performance of female students in introductoryphysics?” Phys. Rev. Phys. Educ. Res. , 020119 (2018).[12] Akira Miyake, Lauren E. Kost-Smith, Noah D.Finkelstein, Steven J. Pollock, Geoffrey L. Cohen,and Tiffany A. Ito, “Reducing the gender achieve-ment gap in college science: A classroom studyof values affirmation,” Science , 1234–1237 (2010),https://science.sciencemag.org/content/330/6008/1234.full.pdf.[13] Vashti Sawtelle, Eric Brewe, and Laird H. Kramer,“Exploring the relationship between self-efficacyand retention in introductory physics,” Journal ofResearch in Science Teaching , 1096–1121 (2012),https://onlinelibrary.wiley.com/doi/pdf/10.1002/tea.21050. [14] Zahra Hazari, Gerhard Sonnert, Philip M. Sadler, andMarie-Claire Shanahan, “Connecting high school physicsexperiences, outcome expectations, physics identity,and physics career choice: A gender study,” Journalof Research in Science Teaching , 978–1003 (2010),https://onlinelibrary.wiley.com/doi/pdf/10.1002/tea.20363.[15] Z. Yasemin Kalender, Emily Marshman, Christian D.Schunn, Timothy J. Nokes-Malach, and ChandralekhaSingh, “Gendered patterns in the construction of physicsidentity from motivational factors,” Phys. Rev. Phys.Educ. Res. , 020119 (2019).[16] Sarah L. Eddy and Sara E. Brownell, “Beneath the num-bers: A review of gender disparities in undergraduate ed-ucation across science, technology, engineering, and mathdisciplines,” Phys. Rev. Phys. Educ. Res. , 020106(2016).[17] Jennifer Blue, Adrienne Traxler, and GeraldineCochran, “Resource letter: Gp-1: Gender and physics,”American Journal of Physics , 616–626 (2019),https://doi.org/10.1119/1.5114628.[18] Jayson M. Nissen and Jonathan T. Shemwell, “Gender,experience, and self-efficacy in introductory physics,”Phys. Rev. Phys. Educ. Res. , 020105 (2016).[19] Christine Lindstrom and Manjula Sharma, “Self-efficacyof first year university physics students: Do gender andprior formal instruction in physics matter?” InternationalJournal of Innovation in Science and Mathematics Edu-cation (2010).[20] Emily Marshman, Zeynep Y. Kalender, ChristianSchunn, Timothy Nokes-Malach, and ChandralekhaSingh, “A longitudinal analysis of students’ motivationalcharacteristics in introductory physics courses: Genderdifferences,” Canadian Journal of Physics , 391–405(2018), https://doi.org/10.1139/cjp-2017-0185.[21] Tobias Espinosa, Kelly Miller, Ives Araujo, and EricMazur, “Reducing the gender gap in students’ physicsself-efficacy in a team- and project-based introductoryphysics class,” Phys. Rev. Phys. Educ. Res. , 010132(2019).[22] Rachel Henderson, Vashti Sawtelle, and Jayson MichealNissen, “Gender & self-efficacy: A call to physics ed-ucators,” The Physics Teacher , 345–348 (2020),https://doi.org/10.1119/1.5145533.[23] Emily M. Marshman, Z. Yasemin Kalender, TimothyNokes-Malach, Christian Schunn, and ChandralekhaSingh, “Female students with A’s have similar physicsself-efficacy as male students with C’s in introductorycourses: A cause for alarm?” Phys. Rev. Phys. Educ.Res. , 020123 (2018).[24] Adrian Madsen, Sarah B. McKagan, and Eleanor C.Sayre, “Gender gap on concept inventories in physics:What is consistent, what is inconsistent, and what factorsinfluence the gap?” Phys. Rev. ST Phys. Educ. Res. ,020121 (2013).[25] Jennifer Docktor, Kenneth Heller, Charles Hen- derson, Mel Sabella, and Leon Hsu, “Gen-der differences in both force concept inventoryand introductory physics performance,” AIPConference Proceedings , 15–18 (2008),https://aip.scitation.org/doi/pdf/10.1063/1.3021243.[26] Simon Bates, Robyn Donnelly, Cait MacPhee, DavidSands, Marion Birch, and Niels R Walet, “Gender dif-ferences in conceptual understanding of newtonian me-chanics: a uk cross-institution comparison,” EuropeanJournal of Physics , 421–434 (2013).[27] Adrienne Traxler, Rachel Henderson, John Stewart, GayStewart, Alexis Papak, and Rebecca Lindell, “Genderfairness within the force concept inventory,” Phys. Rev.Phys. Educ. Res. , 010103 (2018).[28] Rachel Henderson, Paul Miller, John Stewart, AdrienneTraxler, and Rebecca Lindell, “Item-level gender fairnessin the force and motion conceptual evaluation and theconceptual survey of electricity and magnetism,” Phys.Rev. Phys. Educ. Res. , 020103 (2018).[29] Steven J. Pollock, Noah D. Finkelstein, and Lauren E.Kost, “Reducing the gender gap in the physics classroom:How sufficient is interactive engagement?” Phys. Rev. STPhys. Educ. Res. , 010107 (2007).[30] Laura McCullough, “Gender, context, and physics as-sessment,” Journal of International Women’s Studies (2004).[31] Rachel Henderson, Gay Stewart, John Stewart, LynnetteMichaluk, and Adrienne Traxler, “Exploring the gen-der gap in the conceptual survey of electricity and mag-netism,” Phys. Rev. Phys. Educ. Res. , 020114 (2017).[32] Lauren E. Kost-Smith, Steven J. Pollock, and Noah D.Finkelstein, “Gender disparities in second-semester col-lege physics: The incremental effects of a “smog of bias”,”Phys. Rev. ST Phys. Educ. Res. , 020112 (2010).[33] Lauren E. Kost, Steven J. Pollock, and Noah D. Finkel-stein, “Characterizing the gender gap in introductoryphysics,” Phys. Rev. ST Phys. Educ. Res. , 010101(2009).[34] Rebecca L. Matz, Benjamin P. Koester, StefanoFiorini, Galina Grom, Linda Shepard, Charles G.Stangor, Brad Weiner, and Timothy A. McKay,“Patterns of gendered performance differences inlarge introductory courses at five research univer-sities,” AERA Open , 2332858417743754 (2017),https://doi.org/10.1177/2332858417743754.[35] Eric Brewe, Vashti Sawtelle, Laird H. Kramer, George E.O’Brien, Idaykis Rodriguez, and Priscilla Pamel´a, “To-ward equity through participation in modeling instruc-tion in introductory university physics,” Phys. Rev. STPhys. Educ. Res. , 010106 (2010).[36] Shanda Lauer, Jennifer Momsen, Erika Offerdahl,Mila Kryjevskaia, Warren Christensen, and Lisa Montplaisir, “Stereotyped: Investigating genderin introductory science courses,” CBE—Life Sci-ences Education , 30–38 (2013), pMID: 23463226,https://doi.org/10.1187/cbe.12-08-0133.[37] Shima Salehi, Eric Burkholder, G. Peter Lepage, StevenPollock, and Carl Wieman, “Demographic gaps or prepa-ration gaps?: The large impact of incoming prepara-tion on performance of students in introductory physics,”Phys. Rev. Phys. Educ. Res. , 020114 (2019).[38] Staffan Andersson and Anders Johansson, “Gender gapor program gap? students’ negotiations of study practicein a course in electromagnetism,” Phys. Rev. Phys. Educ.Res. , 020112 (2016).[39] Robert H. Tai and Philip M. Sadler, “Gender differencesin introductory undergraduate physics performance:University physics versus college physics in the usa,” In-ternational Journal of Science Education , 1017–1037(2001), https://doi.org/10.1080/09500690010025067.[40] Mercedes Lorenzo, Catherine H. Crouch, and EricMazur, “Reducing the gender gap in the physics class-room,” American Journal of Physics , 118–122 (2006),https://doi.org/10.1119/1.2162549.[41] Idaykis Rodriguez, Geoff Potvin, and Laird H. Kramer,“How gender and reformed introductory physics impactsstudent success in advanced physics courses and continu-ation in the physics major,” Phys. Rev. Phys. Educ. Res. , 020118 (2016).[42] Nafis I. Karim, Alexandru Maries, and ChandralekhaSingh, “Do evidence-based active-engagement courses re-duce the gender gap in introductory physics?” EuropeanJournal of Physics , 025701 (2018).[43] Michael J. Cahill, K. Mairin Hynes, Rebecca Trousil,Lisa A. Brooks, Mark A. McDaniel, Michelle Repice,Jiuqing Zhao, and Regina F. Frey, “Multiyear,multi-instructor evaluation of a large-class interactive-engagement curriculum,” Phys. Rev. ST Phys. Educ.Res. , 020101 (2014).[44] D. Martin, Student Enrollment Profile Report , Enroll-ment Profile (Texas A&M University, 1244 TAMU, Col-lege Station, Texas 77843-1244, 2017).[45] Casper Strømgren, “Genderizeio,” (2013), available athttps://genderize.io/.[46] Junming Huang, Alexander J Gates, Roberta Sinatra,and Albert-L´aszl´o Barab´asi, “Historical comparison ofgender inequality in scientific careers across countries anddisciplines,” Proceedings of the National Academy of Sci-ences , 4609–4616 (2020).[47] Morten W Fagerland, “t-tests, non-parametric tests, andlarge studies—a paradox of statistical practice?” BMCmedical research methodology , 78 (2012).[48] Louis Cohen, Lawrence Manion, and Keith Morrison, Research Methods in Education (Routledge, New YorkCity, New York, 2018). TABLE VII. Average z-score differences between male and female students, ∆, separated by instructor gender, as well as thet-tests and significances between these distributions.Male Instructors Female Instructors∆ t-test p ∆ t-test p Mech. Calc.Exam 1 0.242 6.921 < < p ∆ t-test p Mech. Calc. 0.012 0.282 0.778 0.031 0.449 0.654E&M Calc. 0.141 1.351 0.177 -0.167 -2.693 0.007TABLE IX. Average student performance perception (andstandard error) for each course, as well as the t-test and sig-nificance between these distributions.Male Female t-test p Mech. Alg. 3.384 (0.081) 3.384 (0.062) -0.003 0.998E&M Alg. 3.342 (0.141) 2.886 (0.155) 2.154 0.035Mech. Calc. 3.395 (0.048) 3.062 (0.073) 3.699 < p Mech. Alg. 3.767 (0.111) 3.813 (0.093) -0.318 0.751E&M Alg. 4.026 (0.188) 4.229 (0.146) -0.828 0.410Mech. Calc. 3.994 (0.063) 4.031 (0.091) -0.316 0.752E&M Calc. 3.964 (0.049) 4.005 (0.080) -0.431 0.667TABLE XI. Average student contribution valuation (andstandard error) for each course, as well as the t-test and sig-nificance between these distributions.Male Female t-test pp