[PDF] Do competent women receive unfavorable treatment?

Abstract

Do competent women receive unfavorable treatment than equally competent men? I study this question in a laboratory experiment where unfavorable treatment has material consequences. I find that neither men nor women treat competent women less favorably; if anything, both men and women treat competent women slightly more favorably than equally competent men. The findings provide a piece of evidence that competent women may not necessarily receive unfavorable treatment in settings with material consequences, which may shed new light on hiring and promotion practices in labor markets.

Full PDF

DDo competent women receive unfavorable treatment?

Yuki Takahashi ∗ Latest VersionDecember 9, 2020

Abstract

Do competent women receive unfavorable treatment than equally competent men? I study thisquestion in a laboratory experiment where unfavorable treatment has material consequences.I ﬁnd that neither men nor women treat competent women less favorably; if anything, bothmen and women treat competent women slightly more favorably than equally competentmen. The ﬁndings provide a piece of evidence that competent women may not necessarilyreceive unfavorable treatment in settings with material consequences, which may shed newlight on hiring and promotion practices in labor markets.

JEL Classiﬁcation:

C91, D91, J16, M51

Keywords: competence, gender bias, labor markets, laboratory experiment ∗ Department of Economics, University of Bologna. Email: [email protected] . I am grateful to MariaBigoni, Natalia Montinari, and Siri Isaksson whose feedback was essential for this project. I am also gratefulto participants of the experiment for their participation and cooperation. Ingvild Almås, Laura Anderlucci,Tiziano Arduini, Francesca Barigozzi, Teodora Boneva, Enrico Cantoni, Giovanna d’Adda, Chiara Natalie Focacci,Margherita Fort, Catalina Franco, Astrid Kunze, Fabio Landini, Pascal Langenbach, Annalisa Loviglio, ValeriaMaggian, Joshua Miller, Paola Profeta, Eugenio Proto, Tommaso Sonno, Sigrid Suetens, Alessandro Tavoni,Bertil Tungodden, ESA Experimental Methods Discussion group, and the University of Bologna’s PhD studentsall provided many helpful comments. This paper also beneﬁted from participants’ comments at the AppliedYoung Economist Webinar, the BEEN Meeting, seminars at Ca’ Foscari University, the NHH, and the Universityof Bologna. Veronica Rattini and oTree help & discussion group kindly answered my questions about oTreeprogramming. Lorenzo Golinelli provided excellent technical and administrative assistance. The pre-analysis planis available at the OSF registry: https://osf.io/ypsmx. The experimental instructions are available in the onlineappendix: https://yukitakahashi1.github.io/ﬁles/CareerProgressionApp.pdf. a r X i v : . [ ec on . GN ] D ec Introduction

A literature argues that people consider competent women as less likable than equally competentmen (Heilman 2001; Rudman and Phelan 2008). This is also a view shared by several top femalecorporate executives. However, it is unclear whether being less likable has practical implications;that is, whether competent women receive unfavorable treatment in decisions such as hiringand promotion. Indeed, this question has been explored mostly by means of questionnaires andhypothetical decisions (Heilman et al. 2004; Phelan, Moss-Racusin, and Rudman 2008; Rudmanand Fairchild 2004; Rudman 1998; Rudman et al. 2012).Evidence from decisions with material consequences mainly comes from audit studies andis mixed: while Quadlin (2018) ﬁnds unfavorable treatment, Ceci and Williams (2015) andWilliams and Ceci (2015) do not. One possible reason for this mixed evidence is employers’wrong prior belief about competent women’s personality which tends to be negative as evidencedby the literature: because employers have to work with their employees for a long period of time,they want to hire people whom they are comfortable to work with. However, their prior must beupdated once the employers see the actual job applicants in the interview. Also, in promotiondecisions, employers or managers know a potential candidate very well and their prior beliefmust be irrelevant.In this paper, I tackle this question by means of a controlled laboratory experiment. Iuse dictator game allocation as a measure of favorable and unfavorable treatment with clearmaterial consequences and exogenously vary the recipient’s gender and competence. I measurecompetence by an IQ test, an attribute people care most about (Eil and Rao 2011; Zimmermann2020). In the experiment, participants ﬁrst work on an incentivized IQ test. After the test,participants are randomly assigned to a group of six and receive a ranking of their IQ withintheir group. Once they answer the comprehension questions about their IQ rank, three of thesix members are randomly chosen to be dictators and play three rounds of dictator game withthe other three members chosen to be recipients, observing the recipients’ facial photos and ﬁrstnames – both of which convey information about gender – and the IQ ranks. Using dictator IQ ﬁxed eﬀects and exploiting random grouping of participants to addressthe endogeneity of participants’ IQ and recipients’ gender, I do not ﬁnd a signiﬁcant diﬀerencebetween dictators’ allocation to competent women and to competent men; if anything, dictatorsallocate slightly more to competent women. The point estimate of the diﬀerence is positiveand statistically indistinguishable from 0. The lower bound of the diﬀerence is -3.7% of thedictator endowment, which is quantitatively much smaller (2.4-3.1 times smaller) in absolute

1. In her book

Lean In: Women, Work, and the Will to Lead , the Facebook’s Chief Operating Oﬃcer SherylSandberg expresses her view as follows: “If a woman is competent, she does not seem nice enough. If a womanseems really nice, she is considered more nice than competent. Since people want to hire and promote those whoare both competent and nice, this crates a huge stumbling block for women” (Sandberg 2013).2. The experimental design, the hypotheses, and the empirical strategy are pre-registered at the OSF registry:https://osf.io/ypsmx. However, there are a number of changes to the pre-analysis plan discussed in appendix A.3. The use of photos follows recent literature and allows the dictators to infer the gender of the recipients in anatural way as they would do in their daily life (Babcock et al. 2017; Coﬀman 2014; Isaksson 2018), but I addressthe possibility that recipients’ gender-speciﬁc characteristics (e.g. women may smile more often in a photo) aﬀectdictators’ allocation. Several alternative explanations are inconsistent with the results; most importantly, theresults are not due to experimental manipulation failure, ex-post randomization failure, wrongidentiﬁcation assumptions, or lack of statistical power. These ﬁndings suggest that competentwomen do not receive unfavorable treatment in decisions involving material consequences suchas hiring and promotion.This paper mainly relates to two strands of literature. The ﬁrst focuses on the tradeoﬀwomen face between being competent and being likable. The literature ﬁnds that people perceivefemale leaders (Heilman, Block, and Martell 1995; Heilman and Okimoto 2007; Rudman andKilianski 2000) and competent women (Heilman et al. 2004; Rudman 1998) negatively. Italso ﬁnds that people evaluate competent women negatively, but these results are obtained inset-ups without real consequences (Phelan, Moss-Racusin, and Rudman 2008; Rudman andFairchild 2004; Rudman et al. 2012). However, the studies about evaluations towards competentwomen with real consequences ﬁnd mixed evidence: while Quadlin (2018) ﬁnds top-performingfemale college students less favorable treatment in hiring than equally qualiﬁed male students,Ceci and Williams (2015) and Williams and Ceci (2015) ﬁnd qualiﬁed female applicants forassistant professor positions receive equal or more favorable treatment than equally qualiﬁedmale applicants. My results suggest that the employers’ prior belief about competent womenmay be driving these mixed ﬁndings.When the consequence of their evaluation is not immediately clear, people seem to evaluatewomen in traditionally male occupations more critically: Boring (2017) and Mengel, Sauermann,and Zölitz (2019) ﬁnd that female university instructors receive lower student evaluation. Thereis also evidence that female economists’ work are undervalued (Koﬃ 2019; Sarsons et al. 2020)and female university faculty are less likely to get promotion (De Paola, Ponzo, and Scoppa 2018).However, these critical evaluations may simply reﬂect the lack of women in these occupationsand thus people do not have enough prior information about women’s competence, ratherthan taste-based discrimination. Sarsons (2019) ﬁnds that female surgeons receive a morenegative evaluation for their failure and Ditonto (2017) ﬁnds that voters care more about femalepoliticians’ competence than male politicians’ competence. Also, Bohren, Imas, and Rosenberg(2019) ﬁnd that while women initially receive lower credits than men in their contributions to anonline mathematics discussion forum, they receive higher credits than men after they accumulateenough positive evaluations. My ﬁndings are compatible with the explanation that people donot have enough prior information about women’s competence, and they give fair evaluations towomen once they show they are competent.

4. While dictators only see the recipients’ IQ relative to theirs and thus the competence measure is relative totheirs, dictators do not see their IQ at the time they play dictator games. Indeed, in the real world, we do nothave an absolute measure of other people’s competence but evaluate relative to some benchmark. Nevertheless, Iprovide evidence that relative and absolute competence distinction does not matter for my results. Experiment

The experiment consists of two parts as shown in ﬁgure 1; instructions for each part are onlydelivered at the end of the previous part. Participants earn a participation fee of 2.5€ for theirparticipation. Experimental instructions are available in the online appendix.

Figure 1: Overview of the experiment

Notes:

This ﬁgure shows an overview of the experiment discussed in detail in section 2.1.

Pre-experiment: Random desk assignment & photo taking

After registration at the laboratory entrance, participants are randomly assigned to a desk.Before the start of part 1, participants take their facial photos at a photo booth and enter theirﬁrst name on their computer. After that, we experimenters go to each participant’s desk to checkthat their photo and ﬁrst name match them to ensure all participants that other participants’photos and ﬁrst names are real, following Isaksson (2018).

Part 1: IQ test

In part 1, participants work on an incentivized 9 IQ test questions for 9 minutes. I use Bilkeret al. (2012)’s form A 9-item Raven test which predicts one’s IQ measured with the full-lengthRaven test with more than 90% accuracy. Participants receive 0.5€ for each correct answer.They receive information about how many IQ test questions they have solved correctly only atthe end of the experiment. I use IQ as the measure of competence because previous studies ﬁndit is an attribute people care most about (Eil and Rao 2011; Zimmermann 2020).After the IQ test, participants make an incentivized guess on the number of IQ test questionsthey have solved correctly: they receive 0.5€ if their guess is correct. The answer to this questionmeasures their over-conﬁdence level. They receive feedback on this guess only at the end of theexperiment.Following Eil and Rao (2011), six participants are randomly grouped, and they are informedof the ranking of their IQ relative to other group members. Ties are broken randomly. Theythen have to answer a set of comprehension questions as shown in ﬁgure 2 in order to proceedto the next part.

Part 2: Dictator game

In part 2, three participants in each group are randomly chosen to become dictators and the otherthree participants become recipients. Dictators are paired with the three recipients in their groupone by one in a random order, receive an endowment, and play a dictator game. When they playthe dictator game, dictators observe the recipients’ facial photo and ﬁrst name and IQ rank. The3 igure 2: IQ rank assignment and the comprehension questions

Feedback

Among your 6 group members including you, you received

Rank 4 .Among your 6 group members, how many people performed better than you in the IQ test?Among your 6 group members, how many people performed worse than you in the IQ test?Next

General instructions

Please turn off your mobile phone.Please do not communicate with other participants.Please only use paper and pencil.Once you understand the instructions or enter your decisions, please click “Next” to proceed unless instructedotherwise.If you have any questions, please raise your hand at any time.

Debug info

Basic info

ID in group Group Round number Participant P6 Participant labelSession code vx84ysv2

Notes:

This ﬁgure shows an example of the IQ rank assignment and the comprehension questions. In thisexample, the participant was ranked 4th from the top within a group of 6 participants. Thus, the answer tothe ﬁrst question is 3 (three participants performed better in the IQ test) and the second question is 2 (twoparticipants performed worse in the IQ test). use of photo allows me to convey information about gender of other participants in a naturalway as in the recent literature (Babcock et al. 2017; Coﬀman 2014; Isaksson 2018). Dictatorsare also told that their allocation decisions are anonymous except for the experimenters: theyare told that their allocation is paid to the recipients as a “top-up” to their earnings. Dictatorsdecide allocation by moving a cursor on a slider where the cursor is initially hidden to preventanchoring, as shown in ﬁgure 3. I use a cursor to make it more cognitively demanding to ﬁgureout fair allocation, which is shown to increase more self-interested decisions (Exley and Kessler2019). I also vary the endowment across rounds to make each dictator game less repetitive:7€ for 1st and 3rd rounds, 5€ for 2nd round. At the end of the experiment, one out of threeallocations is randomly chosen for each participant as earnings for this part. I also collect an indirect measure of dictators’ beliefs on how many IQ test questions thepaired recipients have solved correctly. To prevent the belief elicitation to aﬀect or be aﬀectedby the dictator game, I exploit the random assignment of participants to dictators and recipients(derived from the random desk assignment) and use recipients’ beliefs as a proxy for dictators’beliefs. Speciﬁcally, while dictators are playing the dictator game, recipients are paired withthe other two recipients in the same group one by one in random order and make incentivizedguesses on how many IQ test questions they have solved correctly, observing the other tworecipients’ facial photo, ﬁrst name, and IQ rank. Each correct guess gives them 0.5€.

Post-experiment: Questionnaire

After the dictator game and guessing are over, participants are told their earnings from the IQtest, dictator game, and the guesses. Before receiving their earnings, participants answer a short

5. To address the non-anonymity of showing facial photo and ﬁrst name, I ask participants how well they knowthe paired participants on a scale of 4 (did not know at all, saw before, knew but not very well, knew very well). Iask this question twice to make sure they do not answer randomly: right after the three dictator games or twoguesses and in the post-experimental questionnaire.6. For each dictator for each round, one of the three recipients in the same group is randomly chosen withreplacement and the dictator allocates the endowment between themselves and the recipient. Thus, it is possiblethat two dictators play dictator game with the same recipient in the same round. At the end of the dictatorgames, each participant has three allocations, and one of which is randomly chosen for payment. igure 3: Dictator’s allocation screen Round 1 of 3

Neve

Rank 5

You have received for this round.You have been paired with Neve .Please allocate the endowment between yourself and Neve. When you click the line below, a cursor appears. You can movethe cursor by dragging it. Please move the cursor to your preferred position to determine the allocation.

You Neve

General instructions

Debug info

Basic info

ID in group Group Round number Participant P1 Participant labelSession code edtlog7n

Notes:

This ﬁgure shows an example of a dictator’s allocation screen. In this example, the dictator is playingthe ﬁrst round and paired with a recipient whose ﬁrst name is Neve with IQ rank 5. questionnaire about their demographics that are used for balance tests and robustness checks.

The experiment was computerized and programmed with oTree (Chen, Schonger, and Wickens2016), and conducted in English during November-December 2019 at the Bologna Laboratoryfor Experiments in Social Science (BLESS). I recruited 390 students of the University of Bolognavia ORSEE (Greiner 2015) who (i) were born in Italy, (ii) available to participate in Englishexperiments, and (iii) had not participated in gender-related experiments in the past (as far as Icould check). The number of participants was based on the power simulation in the pre-analysisplan to achieve 80% power. The average length of a session was 70 minutes including registration and payment. Theaverage payment per participant was about 10€ including the participation fee and 1.5€ ofgratuity for photo use in another experiment (which I asked for recipients only). I ran 24 sessionsin total and the number of participants in each session varied from 12 to 30 and was a multipleof 6.I limit participants to Italy-born students so that their ﬁrst name and photo do not signal

7. I exclude the 1st session data because of the problem discussed in appendix A. Nevertheless, the resultsincluding the 1st session data give me the same conclusions and are available upon request. and whom the dictator declared they knew them“very well” at least once.These data screenings leave me 390 participants, 195 dictators, and 558 observations (withdictators’ allocation as the unit of observations). I estimate the following equation with OLS:

Allocate ij = β + β IQHigher ij + β F emale j + β IQHigher ij ∗ F emale j + IQF E i + X ij γ + (cid:15) ij (1)where each variable is deﬁned as follows:• Allocate ij ∈ [0 , i ’s allocation to recipient j as a fraction of endowment.• IQHigher ij ∈ { , } : an indicator variable equals 1 if recipient j ’s IQ is higher thandictator i .• F emale j ∈ { , } : an indicator variable equals 1 if recipient j is female.• IQF E i := P l =2 θ l e li : ﬁxed eﬀects for the dictators’ IQ (number of IQ test questions theyhave solved correctly), where e li ∈ { , } is an indicator variable equals 1 if dictator i ’s IQis l=1,...,9, 0 otherwise.• X ij : a set of additional covariates to increase statistical power and to address potentialimbalance. • (cid:15) ij : omitted factors that are correlated with dictator i ’s allocation to recipient j conditionalon covariates.Dictator’s IQ ﬁxed eﬀects is included following Zimmermann (2020) so that the coeﬃcients inequation 1 capture allocation diﬀerences due to the recipients’ IQ, not the dictators’. I clusterstandard error at dictator level (Liang and Zeger 1986) and apply Pustejovsky and Tipton(2018)’s small cluster bias adjustment to address potential inﬂation of the type I error rate dueto moderate cluster size.Table 1 shows what the coeﬃcients in equation 1 identify. β identiﬁes the allocation diﬀerenceto male recipients with higher and lower IQ which captures dictators’ distributional preference,among other eﬀects. β identiﬁes the allocation diﬀerence to female and male recipients withlower IQ, namely every diﬀerence due to the recipients being female (e.g. women smile moreand dictators like to give more to smiling people due to closer social distance). β identiﬁes theinteraction of these two eﬀects. Therefore, the allocation diﬀerence between female and male

8. Although it is easy to distinguish Italian and non-Italian sounding names, to make sure not to misclassify, Iasked the laboratory manager who was native Italian to check participants’ ﬁrst names after each session.9. The covariates include dictator characteristics (age, gender dummy, region of origin dummy, social sciencemajor dummy, STEM major dummy, post-bachelor dummy, over-conﬁdence level), recipient characteristics (age,region of origin dummy), round ﬁxed eﬀects, and ﬁxed eﬀects for proximity between the dictator and the recipient.The full description of the covariates is in appendix B.10. This is because people with diﬀerent IQ (cognitive ability) may have a diﬀerent distributional preference.For example, Almås et al. (2017) ﬁnd that people from a low socio-economics family – which can be correlatedwith their cognitive ability – hold stronger egalitarian views than people from a middle or a high socio-economicfamily. Fisman et al. (2015) ﬁnd that students in a top US law school – who presumably are smarter than averageUS citizens – are more meritocratic and more eﬃciency-oriented than average US citizens. β + β , while the main eﬀect of interest, the sameallocation diﬀerence after controlling for the recipients’ any gender-speciﬁc eﬀects, is identiﬁedby β . Note that only the relative IQ matters because dictators only observe the recipients’ IQrelative to themselves (and I control for dictators’ IQ). Later, I will elaborate on this point more. Table 1: Dictator’s allocation identified by equation 1

Recipient’s gender

Female Male

Recipient’sIQ

Higher β + β + β + β β + β Lower β + β β Notes:

This table shows what the coeﬃcients in equation 1 identify. Each cell represents dictator’s allocationto recipients with higher (ﬁrst row) or lower (second row) IQ and whose gender is female (ﬁrst column) ormale (second column).

Summary statistics

Table 2 summarizes the data after excluding participants and observa-tions discussed in subsection 2.2. Looking at panels A and B, participants’ average IQ level(number of IQ test questions solved correctly) is about 7 (with a maximum 9) and gender isroughly balanced. Also, dictators took nearly 2 minutes to solve the feedback questions on theirIQ rank. Looking at panel C, most dictators did not know the paired recipients (after excludingpairs in which dictator knew the recipient “very well”). Looking at panel D, an average dictatorallocated to paired recipients 40% of their endowment and variation in allocation within each IQis as large as overall variation in allocation. The latter indicates that there is enough variationin allocation I can exploit in my empirical speciﬁcation (which uses dictator’s IQ ﬁxed eﬀects).Figure C1 shows empirical density (panel A) and empirical distribution (panel B) of dictators’allocation to further elaborate panel D of table 2. First, panel A shows that nearly 45% ofdictators have chosen equal allocation. Second, the empirical distribution of giving in panelB resembles the empirical distribution of allocation in Bohnet and Frey (1999)’s one-wayidentiﬁcation treatment which also shows recipients’ face to the dictators.

Balance tests

For coeﬃcients in equation 1 to have causal interpretation, the dictator’s IQrank must be exogenous conditional on the dictator’s IQ ﬁxed eﬀects. Table C1 presents evidencefor this claim. Also, I have to make sure that randomization was successful ex-post so thatdictators face recipients of diﬀerent gender and IQ in a balanced way conditional on the dictator’sIQ ﬁxed eﬀects. Tables C2 and C3 present evidence supporting this claim.

11. Demeaned standard deviation is sample standard deviation of (cid:94)

Allocate ik = Allocate ik − Allocate k , where Allocate ik is allocation by dictator i whose IQ is k and Allocate k = P i ∈ k Allocate ik is average allocation bydictators with IQ k . able 2: Summary statistics: Dictator data Mean SD

Panel A: Dictators

IQ level 6.69 1.23IQ rank 3.58 1.67Age 23.47 2.72Female 0.53 0.50From Emilia-Romagna 0.18 0.39Humanities 0.46 0.50Social sciences 0.19 0.40STEM 0.35 0.48Post bachelor 0.46 0.50Overconﬁdence 0.43 0.76Time on feedback (sec.) 107.60 95.60Observations 195

Panel B: Paired recipients

IQ level 6.84 1.16IQ rank 3.42 1.74IQ higher 0.53 0.50Age 23.35 2.77Female 0.47 0.50From Emilia-Romagna 0.20 0.40Observations 558

Panel C: Proximity

Did not know at all 0.96 0.19Knew but not well 0.03 0.17Saw before 0.01 0.09Observations 558

Panel D: Dictator’s allocation (fraction of endowment)

Allocation 0.40 0.24Allocation (demeaned) 0.24Observations 558

Notes:

This table shows summary statistics for the full sample: the dictators’ and the paired recipients’characteristics, how well dictators knew the paired recipients, and dictators’ allocation. Recipients whosename is non-Italian sounding and whom the dictator declared they knew them “very well” at least oneare not included. Standard deviation of demeaned allocation is calculated as sample standard deviationof (cid:94)

Allocate ik = Allocate ik − Allocate k , where Allocate ik is allocation by dictator i whose IQ is k and Allocate k = P i ∈ k Allocate ik is average allocation of dictators with IQ k . Manipulation check

Figure 4 provides evidence that dictators respond diﬀerently to therecipients’ gender and IQ information: it shows dictators’ average allocation for each category ofrecipients – female recipients with higher IQ, female recipients with lower IQ, male recipientswith higher IQ, and male recipients with lower IQ – along with their 95% conﬁdence intervals.Looking at panel A, we see that dictators allocate most to female recipients with higher IQ,more to male recipients with higher IQ, and slightly more to female recipients with lower IQ– compared to male recipients with lower IQ. In addition, the allocations to female recipientswith higher IQ and male recipients with lower IQ are statistically diﬀerent at 5% level and the8llocations to female recipients with higher and lower IQ are marginally statistically diﬀerent at10%.

Figure 4: Dictators’ allocation by the recipients’ category * ** F r a c t i on o f endo w m en t Panel A: All dictators (N=558) ** *** F e m a l e − I Q h i ghe r F e m a l e − I Q l o w e r M a l e − I Q h i ghe r M a l e − I Q l o w e r F r a c t i on o f endo w m en t Panel B: Male dictators (N=260) F e m a l e − I Q h i ghe r F e m a l e − I Q l o w e r M a l e − I Q h i ghe r M a l e − I Q l o w e r F r a c t i on o f endo w m en t Panel C: Female dictators (N=298)

Notes:

This ﬁgure shows dictators’ allocation as a fraction of endowment by recipients’ category along withtheir 95% conﬁdence intervals for all dictators (panel A), male dictators (panel B), and female dictators(panel C). Conﬁdence intervals are calculated with the standard errors clustered at the dictator level withPustejovsky and Tipton (2018)’s small cluster bias adjustment. Horizontal lines over categories indicatestatistically signiﬁcant diﬀerences. Unit of observation: dictator’s allocation. Signiﬁcance levels: * 10%, **5%, and *** 1%.

Panel B, which shows male dictators’ average allocation for each category of recipients,presents the same pattern as panel A but the diﬀerences are larger. In addition, some diﬀerencesare more statistically signiﬁcant despite the smaller sample size: the allocations to femalerecipients with higher IQ and male recipients with lower IQ are statistically diﬀerent at 5% sodo the allocations to female recipients with higher and lower IQ. Also, the allocations to malerecipients with higher and lower IQ are marginally statistically diﬀerent at 10%.On the other hand, female dictators’ average allocation for each category of recipientspresented in panel C shows a rather stark diﬀerence between female and male dictators. Whilemale dictators discriminate more based on ability and gender, female dictators do not. Indeed, all9he allocation diﬀerences are statistically insigniﬁcant even at 10%. This observation is consistentwith the existing literature that women are more inequality averse (Croson and Gneezy 2009)but inconsistent with Cappelen, Falch, and Tungodden (2019) who ﬁnd that women dislikemale losers more than men. In addition, female dictators’ allocation is overall higher than maledictators, consistent with existing dictator game experiments (Engel 2011). The diﬀerences inobservable characteristics between female and male dictators reported in panel A of table C4are also consistent with the existing literature. Table 3 presents the results with all dictators. Column 1 presents estimate without controllingfor dictator’s IQ and shows the direction of the bias without including dictator IQ ﬁxed eﬀects:although statistically insigniﬁcant, dictators with lower IQ allocate more to recipients withhigher IQ regardless of the recipients’ gender as shown by the coeﬃcient estimate on

IQHigher ij ,biasing the estimate upwards. From columns 2 to 5, I gradually increase the number of covariatesto check the robustness of my main speciﬁcation in column 5. They show that the coeﬃcientestimates are stable across 4 columns.Looking at column 5, the coeﬃcient estimate on IQHigher ij ∗ F emale j is positive andstatistically insigniﬁcant. To give a statistical claim about the insigniﬁcance, I use dualitybetween hypothesis testing and conﬁdence interval (Casella and Berger 2001) and examinewhat eﬀect size we can reject and whether it is quantitatively important as typically done inepistemology (e.g. Chaisemartin and Chaisemartin 2020). Thus turning to the 95% conﬁdenceinterval reported below the standard error estimate, the negative end is about -0.037, suggestingthat we can reject the eﬀect size lower than -3.7% of the dictator endowment at 5% signiﬁcancelevel. This value is very small, about 2.4-3.1 times smaller than the eﬀect size of typicaldictator game experiments that examine the role of social distance with university students (e.g.,Brañas-Garza et al. 2010; Charness and Gneezy 2008; Leider et al. 2010). While OLS only picks up the average eﬀect, these observations hold also in distribution.Panel A of ﬁgure 5 presents empirical CDFs of dictators’ allocation for each recipient category,demeaned by the dictator’s IQ ﬁxed eﬀects to give a causal interpretation. The ﬁgure showsthat the CDF of dictators’ allocation to female recipients with higher IQ (solid blue line) almost

12. Table C4 presents the same summary statistics as table 2 but separately for female and male dictators andtheir diﬀerences. It shows that female dictators are more likely to major in humanities, less likely to major insocial sciences and STEM, less overconﬁdent, and tend to allocate more to recipients – characteristics consistentwith the literature on gender diﬀerences.13. Charness and Gneezy (2008) examine how informing the recipient’s family name increases the dictators’giving using a university student sample, and ﬁnd an 8.9% increase in giving as a fraction of endowment. Leideret al. (2010) ﬁnd using a university student sample that dictators increase giving by 11.42% as a fraction ofendowment for their friends relative to someone living in the same student dormitory. Brañas-Garza et al. (2010)also ﬁnd using a university student sample that dictators give about 10% more of their endowment to friendsrelative to other students in the same class.14. Dictators’ allocation is demeaned for dictators’ IQ level so that the CDFs correspond to the regressionresults: (cid:94)

Allocate ik = Allocate ik − Allocate k + Allocate , where

Allocate ik is allocation by dictator i whose IQis k , Allocate k = P i ∈ k Allocate ik is average allocation of dictators with IQ k , and Allocate = P i Allocate ik isaverage allocation by all dictators. Allocate is added to re-center the allocation. Although this re-centering leavesa few observations outside the 0-1 range, they do not alter the results and thus are trimmed for ease of visualinspection. able 3: The role of the recipients’ gender and IQ in dictators’ allocation: Alldictators Outcome: Dictator’s allocation (fraction of endowment)(1) (2) (3) (4) (5)IQHigher 0.031 0.011 0.013 0.005 0.006(0.031) (0.033) (0.033) (0.033) (0.034)[-0.030, 0.093] [-0.054, 0.075] [-0.053, 0.078] [-0.059, 0.070] [-0.061, 0.072]Female 0.018 0.014 0.014 0.007 0.006(0.027) (0.027) (0.027) (0.026) (0.026)[-0.037, 0.072] [-0.040, 0.067] [-0.040, 0.068] [-0.044, 0.058] [-0.045, 0.057]IQHigherxFemale 0.024 0.027 0.026 0.034 0.035(0.037) (0.037) (0.037) (0.036) (0.037)[-0.048, 0.097] [-0.045, 0.100] [-0.048, 0.099] [-0.037, 0.105] [-0.037, 0.107]Dictator IQ FE - (cid:51) (cid:51) (cid:51) (cid:51)

Round FE - - (cid:51) (cid:51) (cid:51)

Proximity FE - - (cid:51) (cid:51) (cid:51)

Dictator controls - - - (cid:51) (cid:51)

Recipient controls - - - - (cid:51)

Female+IQHigherxFemale 0.042 0.041 0.04 0.041 0.041(0.026) (0.026) (0.026) (0.026) (0.026)[-0.009, 0.093] [-0.01, 0.092] [-0.012, 0.091] [-0.01, 0.092] [-0.011, 0.093]Outcome Mean 0.403 0.403 0.403 0.403 0.403Outcome SD 0.239 0.239 0.239 0.239 0.239R-squared 0.011 0.025 0.028 0.079 0.086Observations 558 558 558 558 558Clusters 195 195 195 195 195

Notes:

This table shows OLS estimates of the role of the recipients’ gender and IQ in dictators’ allocation.The outcome variable is dictators’ allocation as a fraction of endowment. The main speciﬁcation is column5 which includes all covariates (see the main text for detail). Columns 2-4 provide robustness of the mainspeciﬁcation by excluding some covariates and column 1 shows bias of not including dictator IQ ﬁxed eﬀects.Joint statistical signiﬁcance of coeﬃcient estimate on Female+IQHigherxFemale is calculated using t-test.The standard error (in parenthesis) and the 95% conﬁdence interval (in bracket) are reported below eachcoeﬃcient estimate. The standard errors are clustered at the dictator level with Pustejovsky and Tipton(2018)’s small cluster bias adjustment. R-squared is net of the dictator IQ ﬁxed eﬀects. Unit of observation:dictator’s allocation. Signiﬁcance levels: * 10%, ** 5%, and *** 1%. always lies to the right of the other CDFs (although all CDFs are statistically indistinguishablefrom each other at 5% signiﬁcance level), suggesting people do not treat competent womenunfavorably than competent men.The results also hold separately for male and female dictators. Column 1 of table 4 presentsresults with male dictators only and column 2 results with female dictators only, both will fullcontrol. First, the coeﬃcient estimate on

IQHigher ij ∗ F emale j is positive and statisticallyinsigniﬁcant both for male and female dictators. Second, while the 95% conﬁdence interval iswider due to the reduction of sample size by about half, we can still reject at 5% signiﬁcance levelthe eﬀect size lower than -9.0% for male dictators and -3.5% for female dictators. -9.0% is stillthe magnitude of the eﬀect size of typical dictator game experiments. As with the full sampleestimate, these observations also hold in distribution as reported in panel B (male dictators) andin panel C (female dictators) of ﬁgure 5. For both male dictators and female dictators, the CDFof dictators’ allocation to female recipients with higher IQ (solid blue line) almost always lieson the right of the other CDFs (although all CDFs are statistically indistinguishable from eachother at 5% signiﬁcance level), suggesting that neither men nor women do not treat competent11 igure 5: CDFs of dictators’ allocation by the recipients’ category (demeaned) C u m u l a t i v e p r obab ili t y Panel A: All dictators (N=558, Kruskal−Wallis simulated p−value=0.372) C u m u l a t i v e p r obab ili t y Panel B: Male dictators (N=260, Kruskal−Wallis simulated p−value=0.120) C u m u l a t i v e p r obab ili t y Panel C: Female dictators (N=298, Kruskal−Wallis simulated p−value=0.221)

Recipient Female−IQ higher Female−IQ lower Male−IQ higher Male−IQ lower

Notes:

These ﬁgures show the empirical distribution of demeaned dictators’ allocation by recipients’ categoryfor all dictators (panel A), male dictators (panel B), and female dictators (panel C). Demeaning was donewith respect to the dictators’ IQ so that the CDFs have causal interpretation: (cid:94)

Allocate ik = Allocate ik − Allocate k + Allocate , where

Allocate ik is allocation by dictator i whose IQ is k , Allocate k = P i ∈ k Allocate ik is average allocation of dictators with IQ k , and Allocate = P i Allocate ik is average allocation by all dictators. Allocate is added to re-center the allocation. Values below 0 and above 1 are trimmed for ease of visualinspection, but including those observations does not alter my results (there are only a few observationsoutside 0-1 range). Kruskal-Wallis simulated p-values are calculated using randomization inference (Young2019) to address arbitrary dependency among observations with 2000 draws under the null hypothesis of nolocation diﬀerence (i.e. all CDFs coincide). Unit of observation: dictator’s allocation.

Table 4: The role of the recipients’ gender and IQ in dictators’ allocation:Robustness checks

Outcome: Dictator’s allocation (fraction of endowment) Belief on IQSample: Male Female Over-conﬁdent Non-over-conﬁdent Evaluator(1) (2) (3) (4) (5)IQHigher 0.048 -0.049 0.032 -0.032 0.232(0.055) (0.042) (0.048) (0.049) (0.303)[-0.062, 0.158] [-0.132, 0.034] [-0.065, 0.128] [-0.130, 0.067] [-0.371, 0.834]Female 0.014 -0.014 -0.007 0.013 -0.352(0.034) (0.037) (0.033) (0.042) (0.292)[-0.054, 0.082] [-0.089, 0.061] [-0.073, 0.060] [-0.072, 0.098] [-0.931, 0.226]IQHigherxFemale 0.031 0.057 0.038 0.046 0.512(0.061) (0.046) (0.050) (0.054) (0.392)[-0.090, 0.152] [-0.035, 0.148] [-0.060, 0.136] [-0.063, 0.154] [-0.261, 1.286]Female+IQHigherxFemale 0.045 0.042 0.031 0.059 0.16(0.047) (0.029) (0.035) (0.038) (0.257)[-0.048, 0.138] [-0.015, 0.1] [-0.037, 0.1] [-0.016, 0.133] [-0.346, 0.666]Outcome Mean 0.369 0.432 0.385 0.427 6.342Outcome SD 0.253 0.223 0.241 0.235 1.89R-squared 0.151 0.084 0.112 0.151 0.097Observations 260 298 325 233 368Clusters 91 104 115 80 193

Notes:

This table shows OLS estimates of the role of the recipients’ gender and IQ in dictators’ allocation formale and female dictators (columns 1-2), overconﬁdent and non-overconﬁdent dictators (columns 3-4), andevaluators’ belief on the recipients’ IQ (column 5). The outcome variable is dictators’ allocation as a fractionof endowment in columns 1-2 and evaluators’ belief on the recipients’ IQ level in column 5. All speciﬁcationsinclude dictator IQ ﬁxed eﬀects, round ﬁxed eﬀects, proximity ﬁxed eﬀects, dictator (or evaluator) controls,and recipient controls, except columns 1 and 2 where dictator’s gender dummy is excluded and columns3-4 where dictator’s overconﬁdence measure is excluded. Joint statistical signiﬁcance of coeﬃcient estimateon Female+IQHigherxFemale is calculated using t-test. The standard error (in parenthesis) and the 95%conﬁdence interval (in bracket) are reported below each coeﬃcient estimate. The standard errors are clusteredat the dictator or the evaluator level with Pustejovsky and Tipton (2018)’s small cluster bias adjustment.R-squared is net of the dictator IQ ﬁxed eﬀects. Unit of observation: dictator’s allocation (columns 1-2) andevaluator’s belief (column 3). Signiﬁcance levels: * 10%, ** 5%, and *** 1%.

While dictators only see the recipients’ IQ relative to theirs and thus the competencemeasure is relative to theirs, dictators do not see their IQ at the time they play dictator games.Indeed, in the real world, we do not have an absolute measure of other people’s competence butevaluate relative to some benchmark. Yet, if anything, overconﬁdent people is likely to considerpeople whose competence is higher than themselves as more competent in absolute terms thannon-overconﬁdent people do, after controlling for their actual competence.In columns 3-4 of table 4, I present the results separately for overconﬁdent dictators (dictatorswho guess their IQ higher than their actual IQ, column 3) and non-overconﬁdent dictators(dictators who guess their IQ equal to or lower than their actual IQ, column 4). For bothtypes of dictators, the coeﬃcient estimate on

IQHigher ij ∗ F emale j is positive and statisticallyinsigniﬁcant and the lower bound of the estimate at 5% signiﬁcance level is almost identical(-6.0% for over-conﬁdent dictators and -6.3% for non-overconﬁdent dictators). Thus, relative orabsolute does not matter for my main results. 13 .3 Alternative explanations Female dictators’ in-group preference

One competing explanation is female dictators’favoritism towards people who belong to the same social group, or in-group preference (Tajfel andTurner 1979), which biases my β estimates upward. However, this explanation is inconsistentwith the data. First, I use the diﬀerence in allocation between lower IQ female and male recipientsas a control group, which eliminates the recipients’ gender-speciﬁc allocation preference foranalysis with female dictators. Second, the results with male dictators who do not have anin-group preference towards female recipients still reject the eﬀect size lower than that of atypical dictator game experiment studying the eﬀect of social distance using a university studentsample. Distaste against lower IQ male recipients

Although I use the diﬀerence in allocationto lower IQ female and male recipients to control for any recipient gender-speciﬁc allocationpreference, this may not be a clean control because people may have a negative bias againstunder-performing men (Cappelen, Falch, and Tungodden 2019; Moss-Racusin, Phelan, andRudman 2010). This explanation is also inconsistent with the data. First, the allocationdiﬀerence between higher IQ female and male recipients without using lower IQ female-maleallocation diﬀerences, estimate of β + β , still suggests the same conclusion: we can reject atthe 5% signiﬁcance level the eﬀect size lower than -1.1% of dictator endowment for all dictators(table 3, column 5), lower than -4.8% for male dictators (table 4, column 1), lower than -1.5%for female dictators (table 4). Second, while these single-diﬀerence estimates do not control forthe recipients’ gender-speciﬁc allocation preference, Cappelen, Falch, and Tungodden (2019) ﬁndthat the distaste mostly comes from women and the results with male dictators only should notbe aﬀected by this distaste. A wrong belief that female recipients are less competent

My empirical speciﬁcationcompares female and male recipients with higher IQ. The identiﬁcation fails if dictators considerfemale recipients as less competent than male recipients even if they have a higher IQ thandictators. Although this is unlikely, Fiske et al. (2002) ﬁnd that people consider women as lesscompetent than men where the competence measures include intelligence.This explanation indeed does not apply to my sample. Column 5 of table 4 presents resultsfrom a regression where I replace dictators’ allocation with recipients’ belief (whom I callevaluator) about the other recipients’ IQ level which proxies dictators’ belief. Recipients’ beliefis a valid proxy for dictators’ belief by the random assignment of participants to dictators andrecipients and that both dictators and recipients face the same environment until the start ofthe dictator game. The estimate of β is positive albeit statistically insigniﬁcant, suggestingdictators do not believe that higher IQ female recipients are less competent than higher IQ malerecipients.This belief analysis, however, points to a potentially interesting diﬀerence in people’s beliefupdating process about women’s and men’s competence: people may update women’s competence

15. Table C5 presents evidence that recipients and dictators do not diﬀer in their observable characteristics andcharacteristics of paired recipients. β . Experimental manipulation failure

The eﬀect size becomes null if dictators do not respondto the recipients’ gender and IQ information. However, dictators in my sample do respond tothe recipients’ gender and IQ information in statistically signiﬁcant ways as we already see inﬁgure 4. Ex-post randomization failure

My empirical speciﬁcation cannot detect causal eﬀects ifeither (i) dictators’ IQ rank is endogenous even conditional on the dictators’ IQ ﬁxed eﬀects or(ii) dictators of speciﬁc characteristics face recipients with a speciﬁc gender or/and with loweror higher IQ. However, both concerns are addressed by a random desk assignment. Also, wesee that even ex-post, the random assignment was successful in tables C1, C2, and C3. Whilethe recipients’ region of origin is unbalanced (table C3, column 10) – which can happen by thedeﬁnition of type I error – I include recipients’ region of origin dummy in my main speciﬁcationwhich controls the imbalance nonparametrically. Table C6 presents results for various subsamplesand we can still reject the eﬀect size lower than -4.3% to -8.7% at 5% signiﬁcance level, whichfurther addresses concerns for ex-post imbalance. Last, while I pool all the higher and lowerIQ recipients despite that dictators can also see the IQ rank diﬀerences, ﬁgure C2 shows thattaking into account the IQ rank diﬀerences does not alter the results. Wrong identiﬁcation assumptions

Any causal inference relies on several assumptions, sofailure to reject the null hypothesis of no eﬀect can be no eﬀect, but can also be that someidentiﬁcation assumptions are wrong. However, aside from those discussed thus far, I do notmake any signiﬁcant assumptions because my empirical speciﬁcation is a simple double diﬀerence-in-means. I also apply Pustejovsky and Tipton (2018)’s small cluster bias adjustment to addressthe ﬁnite-sample bias of the standard error. Thus, it is unlikely that the failure to reject thenull can be attributed to some implausible identiﬁcation assumptions. Note that I also show Ican reject a very small eﬀect size using conﬁdence intervals.

Lack of statistical power

When the power is low (type II error rate is high), the conﬁdenceinterval becomes wider. However, my conﬁdence interval can reject a very small eﬀect size at a5% signiﬁcance level. Also, while there is an ex-post minimum detectable eﬀect estimate, it issimply 2.8 times the standard error and mostly useful for cross-study comparison (McKenzie andOzier 2019); the information used in the conﬁdence interval is strictly larger than the informationused in the ex-post minimum detectable eﬀect.

16. In table C6, column 1 excludes dictators with IQ rank 1 and 6 who never face recipients with lower / higherIQ. Column 2 excludes dictator-recipient pairs in which the dictator knows the recipients even a little and column3 pairs in which the dictator saw the recipients before.17. Figure C2 shows OLS estimates of equation 1 but splitting

IQHigher ij into 6 separate dummies indicatingthe recipients’ IQ rank diﬀerences relative to the dictators’. The lower/higher the recipient’s IQ, the morenegative/positive their IQ rank diﬀerence. For brevity, the ﬁgure only plots the coeﬃcient estimates on theinteraction terms between the 6 separate IQHigher ij and F emale j , ˆ β along with their 95% conﬁdence intervals. Conclusion

This paper examines whether competent women receive unfavorable treatment compared tocompetent men. Using dictator game giving as a measure of favorable and unfavorable treatmentand exogenously varying gender and competence measured by an IQ test, I show that people treatcompetent women no less favorably than competent men; if anything, people treat competentwomen slightly more favorably. The lower bound of my estimate is -3.7% of dictator endowment,which is much smaller than the eﬀect size of dictator game experiments studying the role ofsocial distance. I also show that experimental manipulation is successful, randomization wassuccessful even ex-post, identiﬁcation assumptions are plausible, and the experiment has suﬃcientstatistical power.This paper contributes to the literature in two ways. First, I provide evidence that, in thestylized environment where unfavorable treatment has material consequences, the argument thatwomen face a tradeoﬀ between being competent and being likable does not hold. This suggeststhat competent women may receive fair treatment in hiring and promotion if the results areexternally valid. Second, while several studies show that women are more critically evaluated intraditionally male occupations, my results indicate that a plausible explanation for this evidenceis people’s lack of enough prior about women’s competence in these occupations rather thantaste-based gender discrimination.Indeed, there is ample evidence that female leaders (Chakraborty and Serra 2019; Håkansson2020) and competitors (Datta Gupta, Poulsen, and Villeval 2013) receive more aggressivetreatments and receive less support by men (Born, Ranehill, and Sandberg 2020). My studyis silent to gender discrimination where there are intense interactions and competition amongwomen and men; there is evidence that men hold motivated gender bias (Sinclair and Kunda2000) and it is a topic of future research. Still, my results apply to vertical relationships such asworkers vs. managers and employees vs. employers and provide a piece of evidence in consideringhiring and promotion practices in labor markets.16 eferences

Almås, Ingvild, Alexander W Cappelen, Kjell G Salvanes, Erik Ø Sørensen, and Bertil Tungodden.2017. “Fairness and family background.”

Politics, Philosophy & Economics

16, no. 2 (1,2017): 117–131.Babcock, Linda, María P. Recalde, Lise Vesterlund, and Laurie Weingart. 2017. “GenderDiﬀerences in Accepting and Receiving Requests for Tasks with Low Promotability.”

American Economic Review

107 (3): 714–747.Bell, Robert M., and Daniel F. McCaﬀrey. 2002. “Bias Reduction in Standard Errors for LinearRegression with Multi-Stage Samples.”

Survey Methodology

28 (2): 169–181.Bilker, Warren B., John A. Hansen, Colleen M. Brensinger, Jan Richard, Raquel E. Gur, andRuben C. Gur. 2012. “Development of Abbreviated Nine-Item Forms of the Raven’s StandardProgressive Matrices Test.”

Assessment

19, no. 3 (1, 2012): 354–369.Bohnet, Iris, and Bruno S. Frey. 1999. “Social Distance and Other-Regarding Behavior inDictator Games: Comment.”

American Economic Review

89 (1): 335–339.Bohren, J. Aislinn, Alex Imas, and Michael Rosenberg. 2019. “The Dynamics of Discrimination:Theory and Evidence.”

American Economic Review

109 (10): 3395–3436.Boring, Anne. 2017. “Gender biases in student evaluations of teaching.”

Journal of PublicEconomics

145 (1, 2017): 27–41.Born, Andreas, Eva Ranehill, and Anna Sandberg. 2020. “Gender and Willingness to Lead: Doesthe Gender Composition of Teams Matter?”

The Review of Economics and Statistics (10,2020).Brañas-Garza, Pablo, Ramón Cobo-Reyes, María Paz Espinosa, Natalia Jiménez, JaromírKovářík, and Giovanni Ponti. 2010. “Altruism and social integration.”

Games and EconomicBehavior

69, no. 2 (1, 2010): 249–257.Cappelen, Alexander, Ranveig Falch, and Bertil Tungodden. 2019.

The Boy Crisis: ExperimentalEvidence on the Acceptance of Males Falling Behind.

HCEO Working Paper 2019-014.Casella, George, and Roger L. Berger. 2001.

Statistical Inference.

Frontiersin Psychology

ClinicalInfectious Diseases.

Chakraborty, Priyanka, and Danila Serra. 2019.

Gender diﬀerences in top leadership roles: Doesworker backlash matter?

Working Paper.Charness, Gary, and Uri Gneezy. 2008. “What’s in a name? Anonymity and social distance indictator and ultimatum games.”

Journal of Economic Behavior & Organization

68, no. 1 (1,2008): 29–35. 17hen, Daniel L., Martin Schonger, and Chris Wickens. 2016. “oTree—An open-source platformfor laboratory, online, and ﬁeld experiments.”

Journal of Behavioral and ExperimentalFinance

The Quarterly Journal of Economics

Journal of EconomicLiterature

47 (2): 448–474.Datta Gupta, Nabanita, Anders Poulsen, and Marie Claire Villeval. 2013. “Gender Matchingand Competitiveness: Experimental Evidence.”

Economic Inquiry

51 (1): 816–835.De Paola, Maria, Michela Ponzo, and Vincenzo Scoppa. 2018. “Are Men Given Priority for TopJobs? Investigating the Glass Ceiling in Italian Academia.”

Journal of Human Capital

Political Behavior

39, no. 2 (1, 2017): 301–325.Eil, David, and Justin M. Rao. 2011. “The Good News-Bad News Eﬀect: Asymmetric Processingof Objective Information about Yourself.”

American Economic Journal: Microeconomics

Experimental Economics

14, no. 4 (1,2011): 583–610.Exley, Christine L., and Judd B. Kessler. 2019.

Motivated Errors.

Working Paper.Fiske, Susan T., Amy J. C. Cuddy, Peter Glick, and Jun Xu. 2002. “A model of (often mixed)stereotype content: Competence and warmth respectively follow from perceived status andcompetition.”

Journal of personality and social psychology

82 (6): 878–902.Fisman, Raymond, Pamela Jakiela, Shachar Kariv, and Daniel Markovits. 2015. “The distribu-tional preferences of an elite.”

Science

Journal of the Economic Science Association

1, no. 1 (1, 2015): 114–125.Håkansson, Sandra. 2020. “Do women pay a higher price for power? Gender bias in politicalviolence in Sweden.”

Journal of Politics.

Heilman, Madeline E. 2001. “Description and Prescription: How Gender Stereotypes PreventWomen’s Ascent Up the Organizational Ladder.”

Journal of Social Issues

57, no. 4 (1, 2001):657–674.Heilman, Madeline E., Caryn J. Block, and Richard F. Martell. 1995. “Sex stereotypes: Do theyinﬂuence perceptions of managers?”

Journal of Social Behavior and Personality

10 (6):237–252.Heilman, Madeline E., and Tyler G. Okimoto. 2007. “Why are women penalized for success atmale tasks?: the implied communality deﬁcit.”

The Journal of Applied Psychology

92 (1):81–92.Heilman, Madeline E., Aaron S. Wallen, Daniella Fuchs, and Melinda M. Tamkins. 2004.“Penalties for success: reactions to women who succeed at male gender-typed tasks.”

TheJournal of Applied Psychology

89 (3): 416–427.18saksson, Siri. 2018.

It Takes Two: Gender Diﬀerences in Group Work.

Working Paper. 14, 2018.Koﬃ, Marlène. 2019.

Innovative Ideas and Gender Inequality.

Working Paper.Leider, Stephen, Tanya Rosenblat, Markus M. Möbius, and Quoc-Anh Do. 2010. “What do weExpect from our Friends?”

Journal of the European Economic Association

Biometrika

73, no. 1 (1, 1986): 13–22.MacKinnon, James G, and Halbert White. 1985. “Some heteroskedasticity-consistent covariancematrix estimators with improved ﬁnite sample properties.”

Journal of Econometrics

29, no.3 (1, 1985): 305–325.McKenzie, David, and Owen Ozier. 2019.

Why ex-post power using estimated eﬀect sizes is bad,but an ex-post MDE is not.

Development Impact. The World Bank, 16, 2019.Mengel, Friederike, Jan Sauermann, and Ulf Zölitz. 2019. “Gender Bias in Teaching Evaluations.”

Journal of the European Economic Association.

Moss-Racusin, Corinne A., Julie E. Phelan, and Laurie A. Rudman. 2010. “When men breakthe gender rules: Status incongruity and backlash against modest men.”

Psychology of Men& Masculinity

11 (2): 140–151.Phelan, Julie E., Corinne A. Moss-Racusin, and Laurie A. Rudman. 2008. “Competent yet Out inthe Cold: Shifting Criteria for Hiring Reﬂect Backlash Toward Agentic Women.”

Psychologyof Women Quarterly

32 (4): 406–413.Pustejovsky, James E., and Elizabeth Tipton. 2018. “Small-Sample Methods for Cluster-RobustVariance Estimation and Hypothesis Testing in Fixed Eﬀects Models.”

Journal of Business& Economic Statistics

36, no. 4 (2, 2018): 672–683.Quadlin, Natasha. 2018. “The Mark of a Woman’s Record: Gender and Academic Performancein Hiring.”

American Sociological Review

83, no. 2 (1, 2018): 331–360.Rudman, Laurie A. 1998. “Self-promotion as a risk factor for women: The costs and beneﬁts ofcounterstereotypical impression management.”

Journal of Personality and Social Psychology

74 (3): 629–645.Rudman, Laurie A., and Kimberly Fairchild. 2004. “Reactions to counterstereotypic behavior:the role of backlash in cultural stereotype maintenance.”

Journal of Personality and SocialPsychology

87 (2): 157–176.Rudman, Laurie A., and Stephen E. Kilianski. 2000. “Implicit and Explicit Attitudes TowardFemale Authority.”

Personality and Social Psychology Bulletin

26, no. 11 (1, 2000): 1315–1328.Rudman, Laurie A., Corinne A. Moss-Racusin, Julie E. Phelan, and Sanne Nauts. 2012. “Statusincongruity and backlash eﬀects: Defending the gender hierarchy motivates prejudice againstfemale leaders.”

Journal of Experimental Social Psychology

48, no. 1 (1, 2012): 165–179.Rudman, Laurie A., and Julie E. Phelan. 2008. “Backlash eﬀects for disconﬁrming genderstereotypes in organizations.”

Research in Organizational Behavior

28 (1, 2008): 61–79.Sandberg, Sheryl. 2013.

Lean In: Women, Work, and the Will to Lead.

New York, NY: Knopf,12, 2013. 19arsons, Heather. 2019.

Interpreting Signals in the Labor Market: Evidence from Medical Referrals.

Working Paper. 16, 2019.Sarsons, Heather, Klarita Gërxhani, Ernesto Reuben, and Arthur Schram. 2020. “GenderDiﬀerences in Recognition for Group Work.”

Journal of Political Economy.

Sinclair, Lisa, and Ziva Kunda. 2000. “Motivated Stereotyping of Women: She’s Fine if ShePraised Me but Incompetent if She Criticized Me.”

Personality and Social PsychologyBulletin

26, no. 11 (1, 2000): 1329–1342.Tajfel, Henry, and John Turner. 1979. “An integrative theory of intergroup conﬂict.” In

Thesocial psychology of intergroup relations, edited by William G. Austin and Stephen Worchel,33–47. Monterey, CA: Brooks Cole Publishing.Williams, Wendy M., and Stephen J. Ceci. 2015. “National hiring experiments reveal 2:1 facultypreference for women on STEM tenure track.”

Proceedings of the National Academy ofSciences

The Quarterly Journal of Economics

American Economic Review

110 (2): 337–361. 20 ppendix A Changes to the pre-analysis plan

In the initial design, recipients ﬁnished all the tasks except the post-questionnaire and left thelaboratory before dictators receive their IQ rank, so that dictators could play dictator gamewithout recipients in the same room. The allocation to the recipients was paid electronically asa “participation fee” for the online post-questionnaire which was sent to recipients via emailafter the session was over. However, as I ran the 1st session with this initial design with 24participants, dictators had to wait idly for about 20-30 minutes until recipients left the laboratoryand dictators seemed to have lost concentration during this period: about half of the dictatorscould not answer the comprehension questions about their IQ rank. Thus, I changed the designand let recipients stay in the laboratory while dictators played the dictator game. I looked at the1st session data before making this change. I exclude the 1st session data in the analysis, butresults including the 1st session data delivers the same conclusion and available upon request.Also, the oTree code and instructions used for the 1st session are available upon request.I also made the following minor changes after the 1st session:1. I reduced participation fee from 3€ to 2.5€ because participants earned more than Iexpected in the IQ test.2. I added more explanation to the instructions on how the IQ rank was assigned and how toallocate endowment in the dictator game.3. I asked participants’ major by simply choosing among the choices from humanities, socialsciences, natural sciences/mathematics, medicine, and engineering and letting them typein their degree program name for a check, instead of letting them access to the Universityof Bologna’s degree program website. This is because the computers in the laboratorysometimes did not accept iframe or prevented a pop-up to another website due to thesecurity setting.Other changes are the following:

Interpretation and focus :1. I rephrased smartness as competence to better place my results in the literature.2. I mainly discussed results for question 3.

Analysis :3. I corrected the deﬁnition of

Lower ij . Consequently, I renamed it as IQHigher ij to makethe meaning clearer.4. I added distributional analysis (in ﬁgure 5) to also examine whether the results hold alsoin distribution.5. I used lm_robust instead of vcovCR to apply Pustejovsky and Tipton (2018)’s small clusterbias adjustment because vcovCR did not make degrees of freedom adjustment.6. I included in female and male dictator regressions STEM major dummy and Emilia-Romagna dummy because excluding them in regressions where the sample is conditionedby gender made little sense. The results are invariant to the exclusion of these covariates.7. I divided dictators’ allocation by dictator endowment to facilitate the interpretation of theregression results (this does not aﬀect my results because of the round ﬁxed eﬀects).21 ppendix B Description of covariates X ij in the main speciﬁcation (equation 1) includes the following variables:Dictator characteristics• Age i ∈ N : dictator i ’s age.• F emale i ∈ { , } : an indicator variable equals 1 if dictator i is female, 0 otherwise.• F romEmiliaRomagna i ∈ { , } : an indicator variable equals 1 if dictator i is fromEmilia-Romagna region (where the University of Bologna is located), 0 otherwise.• SocialSciences i ∈ { , } : an indicator variable equals 1 if dictator i ’s major is socialsciences, 0 otherwise.• ST EM i ∈ { , } : an indicator variable equals 1 if dictator i ’s major is natural sci-ences/mathematics, engineering, or medicine; 0 otherwise.• P ostBachelor i ∈ { , } : an indicator variable equals 1 if dictator i ’s degree programis either master/post-bachelor, in the 4th year or beyond of bachelor-master combinedprogram, or PhD, 0 otherwise. • OverConf idence i ∈ {− , , } : degree of dictator i ’s overconﬁdence. It is equal to − i ’s guess about the number of IQ test questions they correctly solved is lowerthan the actual number, 0 if equal to the actual number, and 1 if higher than the actualnumber.Recipient characteristics• Age j ∈ N : recipient j ’s age.• F romEmiliaRomagna j ∈ { , } : an indicator variable equals 1 if recipient j is fromEmilia-Romagna region, 0 otherwise.Fixed eﬀects• P l =2 r l : round ﬁxed eﬀects where r l ∈ { , } is an indicator variable equals 1 if the roundis equal to l=1,2,3, 0 otherwise.• P l =2 q lij : proximity ﬁxed eﬀects where q lij ∈ { , } is an indicator variable showing theproximity between dictator i and recipient j , and equals 1 if dictator i does not knowrecipient j at all (l=1), has seen before (l=2), knows but not very well (l=3).

18. In Italy, bachelor is a 3 year program. ppendix C Additional ﬁgures and tables Figure C1: Density and distribution of the dictators’ allocation

Giving in the dictator game (fraction of endowment) D en s i t y Panel A: Density (N=558)

Giving in the dictator game (fraction of endowment) C u m u l a t i v e p r obab ili t y Panel B: Distribution (N=558)

Notes:

These ﬁgures show the empirical density (panel A) and the empirical distribution (panel B) of thedictators’ allocation as a fraction of endowment. Recipients whose name is non-Italian sounding and whomthe dictator declared they knew them “very well” at least once are excluded. Unit of observation: dictator’sallocation. igure C2: The role of the recipients’ IQ and gender in dictators’ allocation:Taking into account for IQ rank differences −0.10.00.10.2 +3 +2 +1 −1 −2 −3 Recipient's relative IQ rank b ^ Notes:

This ﬁgure shows OLS estimates of the role of recipient’s gender and IQ in dictators’ allocation thattakes into account for the IQ rank diﬀerences dictators observe by splitting

IQHigher ij into 6 separatedummies indicating the recipients’ IQ rank diﬀerences relative to the dictators’. The lower/higher therecipient’s IQ, the more negative/positive their IQ rank diﬀerence. The speciﬁcation includes dictator IQﬁxed eﬀects, round ﬁxed eﬀects, proximity ﬁxed eﬀects, dictator controls, and recipient controls. The outcomevariable is dictators’ allocation as a fraction of endowment. For brevity, the ﬁgure only plots the coeﬃcientestimates on the interaction terms between the 6 separate IQHigher ij and F emale j , ˆ β , along with their95% conﬁdence intervals, which is calculated with standard errors clustered at dictator level with Pustejovskyand Tipton (2018)’s small cluster bias adjustment. Unit of observation: dictator’s allocation. Table C1: Balance test: IQ rank

Outcome: Age Female From Emilia-Romagna Human-ities Socialsciences STEM Postbachelor Over-conﬁdence(1) (2) (3) (4) (5) (6) (7) (8)IQ rank = 2 0.010 0.221* 0.074 -0.095 0.034 0.061 0.151 0.146(0.796) (0.128) (0.104) (0.130) (0.088) (0.130) (0.127) (0.200)IQ rank = 3 -0.300 0.139 -0.007 -0.101 0.183 -0.081 0.183 0.160(0.776) (0.143) (0.103) (0.142) (0.120) (0.137) (0.137) (0.241)IQ rank = 4 -0.536 0.094 0.138 -0.146 0.101 0.045 0.187 0.430*(0.894) (0.148) (0.116) (0.148) (0.123) (0.148) (0.145) (0.258)IQ rank = 5 0.534 0.092 0.062 -0.220 0.166 0.054 0.061 0.158(0.959) (0.165) (0.128) (0.175) (0.128) (0.165) (0.156) (0.271)IQ rank = 6 -0.040 0.070 0.021 -0.368* 0.442*** -0.074 0.013 0.346(1.093) (0.191) (0.147) (0.201) (0.162) (0.173) (0.191) (0.306)Dictator IQ FE (cid:51) (cid:51) (cid:51) (cid:51) (cid:51) (cid:51) (cid:51) (cid:51)

F statistic 0.571 0.634 0.704 0.697 1.91* 0.626 0.739 0.83R-squared 0.040 0.067 0.040 0.042 0.074 0.062 0.027 0.032Observations 195 195 195 195 195 195 195 195

Notes:

This table shows balance across dictators with diﬀerent IQ ranks. The estimates are obtained byrunning OLS regression of various dictator characteristics on IQ rank dummies with dictator IQ ﬁxed eﬀects.The F statistic shows the joint signiﬁcance of IQ rank = 2 to IQ rank = 6 dummies. HC2 heteroskedasticity-robust standard errors (MacKinnon and White 1985) with Bell and McCaﬀrey (2002)’s small sample biasadjustment are reported below each coeﬃcient estimate. R-squared is net of dictator IQ ﬁxed eﬀects. Unit ofobservation: dictator. Signiﬁcance levels: * 10%, ** 5%, and *** 1%. able C2: Balance test: Recipient’s category Outcome: Age Female From Emilia-Romagna Human-ities Socialsciences STEM Postbachelor Over-conﬁdence(1) (2) (3) (4) (5) (6) (7) (8)IQHigher -0.429 0.001 0.105** -0.065 0.106** -0.041 -0.071 0.063(0.350) (0.064) (0.048) (0.065) (0.051) (0.060) (0.063) (0.107)Female -0.228 0.060 0.080* -0.026 0.015 0.011 -0.043 0.040(0.336) (0.059) (0.048) (0.057) (0.046) (0.057) (0.060) (0.090)IQHigherxFemale 0.431 0.010 -0.148** 0.014 -0.063 0.049 0.069 -0.051(0.458) (0.082) (0.064) (0.081) (0.062) (0.079) (0.084) (0.129)Dictator IQ FE (cid:51) (cid:51) (cid:51) (cid:51) (cid:51) (cid:51) (cid:51) (cid:51)

F statistic 0.522 1.078 2.074 0.505 1.731 0.661 0.417 0.119R-squared 0.029 0.052 0.034 0.025 0.028 0.050 0.014 0.007Observations 558 558 558 558 558 558 558 558Clusters 195 195 195 195 195 195 195 195

Notes:

This table shows that dictators were matched recipients of diﬀerent gender and IQ in a balanced wayeven ex-post. The estimates are obtained by running OLS regression of various dictator characteristics oncovariates of interest with dictator IQ ﬁxed eﬀects. The F statistic shows the joint signiﬁcance of all covariates.The standard errors are clustered at the dictator level with Pustejovsky and Tipton (2018)’s small clusterbias adjustment are reported below each coeﬃcient estimate. R-squared is net of dictator IQ ﬁxed eﬀects.Unit of observation: dictator-recipient pair. Signiﬁcance levels: * 10%, ** 5%, and *** 1%.

Table C3: Balance test: Recipient’s category (cont.)

Outcome: Age(recipient) From Emilia-Romagna(recipient) Dictatorgameround 1 Dictatorgameround 2 Dictatorgameround 3 Did notknowat all Sawbefore Knewbut notvery well(9) (10) (11) (12) (13) (14) (15) (16)IQHigher -0.792** 0.188*** -0.084 -0.026 0.110* -0.002 0.008 -0.006(0.374) (0.050) (0.065) (0.064) (0.061) (0.026) (0.022) (0.018)Female -0.284 0.025 -0.084 0.037 0.047 0.020 -0.011 -0.009(0.344) (0.038) (0.062) (0.058) (0.059) (0.020) (0.017) (0.010)IQHigherxFemale 0.626 -0.100 0.137 -0.084 -0.053 -0.020 0.005 0.014(0.462) (0.062) (0.084) (0.079) (0.084) (0.026) (0.025) (0.020)Dictator IQ FE (cid:51) (cid:51) (cid:51) (cid:51) (cid:51) (cid:51) (cid:51) (cid:51)

F statistic 1.537 5.51*** 0.941 0.89 1.207 0.666 0.415 1.071R-squared 0.013 0.041 0.006 0.006 0.007 0.047 0.014 0.074Observations 558 558 558 558 558 558 558 558Clusters 195 195 195 195 195 195 195 195

Notes:

This table shows that dictators were matched recipients of diﬀerent gender and IQ in a balanced wayeven ex-post. The estimates are obtained by running OLS regression of various recipient characteristics andround and proximity dummies on covariates of interest with dictator IQ ﬁxed eﬀects. The F statistic showsthe joint signiﬁcance of all covariates. The standard errors are clustered at the dictator level with Pustejovskyand Tipton (2018)’s small cluster bias adjustment are reported below each coeﬃcient estimate. R-squared isnet of dictator IQ ﬁxed eﬀects. Unit of observation: dictator-recipient pair. Signiﬁcance levels: * 10%, ** 5%,and *** 1%. able C4: Summary statistics: Dictator data by gender Female Male DiﬀerenceMean SD Mean SD p-value

Panel A: Dictators

IQ level 6.52 1.20 6.89 1.24 0.04IQ rank 3.83 1.59 3.31 1.73 0.03Age 23.68 2.62 23.23 2.81 0.25From Emilia-Romagna 0.18 0.39 0.19 0.39 0.94Humanities 0.58 0.50 0.32 0.47 0.00Social sciences 0.15 0.36 0.24 0.43 0.13STEM 0.27 0.45 0.44 0.50 0.01Post bachelor 0.53 0.50 0.37 0.49 0.03Overconﬁdence 0.31 0.78 0.56 0.72 0.02Time on feedback (sec.) 107.67 89.88 107.52 102.26 0.99Observations 104 91

Panel B: Paired recipients

IQ level 6.77 1.19 6.91 1.12 0.15IQ rank 3.39 1.75 3.45 1.74 0.72IQ higher 0.57 0.50 0.48 0.50 0.03Age 23.17 2.57 23.55 2.98 0.12Female 0.50 0.50 0.43 0.50 0.10From Emilia-Romagna 0.15 0.36 0.25 0.43 0.01Observations 298 260

Panel C: Proximity

Did not know at all 0.98 0.15 0.95 0.23 0.07Knew but not well 0.02 0.15 0.03 0.18 0.44Saw before 0.00 0.00 0.02 0.14 0.02Observations 298 260

Panel D: Dictator’s allocation (fraction of endowment)

Allocation 0.43 0.22 0.37 0.25 0.00Allocation (demeaned) 0.22 0.25Observations 298 260

Notes:

This table shows summary statistics separately for female and male dictators: the dictators’ and thepaired recipients’ characteristics, how well dictators knew the paired recipients, and dictators’ allocation.Recipients whose name is non-Italian sounding and whom the dictator declared they knew them “very well”at least one are not included. Standard deviation of demeaned allocation is calculated as sample standarddeviation of (cid:94)

Allocate ik = Allocate ik − Allocate k , where Allocate ik is allocation by dictator i whose IQ is k and Allocate k = P i ∈ k Allocate ik is average allocation of dictators with IQ k . P-values for diﬀerence in meansare calculated with the two-sample t-test with HC2 heteroskedasticity-robust standard errors (MacKinnonand White 1985) with Bell and McCaﬀrey (2002)’s small sample bias adjustment. able C5: Summary statistics: Evaluator data vs. dictator data Evaluator Dictator DiﬀerenceMean SD Mean SD p-value

Panel A: Evaluator / Dictator

IQ level 6.84 1.14 6.69 1.23 0.21IQ rank 3.40 1.74 3.58 1.67 0.30Age 23.34 2.78 23.47 2.72 0.63From Emilia-Romagna 0.20 0.40 0.18 0.39 0.76Humanities 0.34 0.48 0.46 0.50 0.02Social sciences 0.27 0.44 0.19 0.40 0.08STEM 0.39 0.49 0.35 0.48 0.42Post bachelor 0.49 0.50 0.46 0.50 0.48Overconﬁdence 0.49 0.75 0.43 0.76 0.42Time on feedback (sec.) 93.26 83.96 107.60 95.60 0.12Observations 193 195

Panel B: Paired recipients

IQ level 6.84 1.16 6.84 1.16 1.00IQ rank 3.42 1.74 3.42 1.74 0.98IQ higher 0.50 0.50 0.53 0.50 0.46Age 23.35 2.80 23.35 2.77 0.99Female 0.47 0.50 0.47 0.50 0.99From Emilia-Romagna 0.19 0.40 0.20 0.40 0.87Observations 368 558

Panel C: Proximity

Did not know at all 0.98 0.14 0.96 0.19 0.08Knew but not well 0.02 0.14 0.03 0.17 0.34Saw before 0.00 0.00 0.01 0.09 0.03Observations 368 558

Panel D: Belief on the recipient’s IQ

Belief on IQ level 6.34 1.89Belief on IQ level (demeaned) 1.87Observations 368

Notes:

This table shows summary statistics for the evaluators and dictators: the evaluators’/dictators’ andthe paired recipients’ characteristics, how well evaluators/dictators knew the paired recipients, and evaluators’belief. Recipients whose name is non-Italian sounding and whom the dictator declared they knew them “verywell” at least one are not included. P-values for diﬀerence in means are calculated with the two-sample t-testwith HC2 heteroskedasticity-robust standard errors (MacKinnon and White 1985) with Bell and McCaﬀrey(2002)’s small sample bias adjustment. able C6: The role of the recipients’ gender and IQ in the dictators’ allocation:Further robustness checks Outcome: Dictator’s allocation (fraction of endowment)Sample: ExcludingIQ rank 1 and 6 Excludingproximity 3 Excludingproximity 2 and 3(1) (2) (3)IQHigher 0.006 0.011 0.005(0.036) (0.033) (0.034)[-0.065, 0.077] [-0.056, 0.077] [-0.062, 0.073]Female 0.019 0.006 0.008(0.029) (0.026) (0.027)[-0.040, 0.077] [-0.045, 0.058] [-0.046, 0.062]IQHigherxFemale 0.001 0.029 0.029(0.044) (0.037) (0.038)[-0.087, 0.088] [-0.043, 0.102] [-0.047, 0.104]Female+IQHigherxFemale 0.019 0.036 0.037(0.034) (0.026) (0.026)[-0.047, 0.086] [-0.016, 0.087] [-0.015, 0.088]Outcome Mean 0.412 0.402 0.404Outcome SD 0.234 0.239 0.239R-squared 0.102 0.083 0.079Observations 386 553 537Clusters 135 195 194

Notes: