[PDF] The Gender Pay Gap Revisited with Big Data: Do Methodological Choices Matter?

Abstract

The vast majority of existing studies that estimate the average unexplained gender pay gap use unnecessarily restrictive linear versions of the Blinder-Oaxaca decomposition. Using a notably rich and large data set of 1.7 million employees in Switzerland, we investigate how the methodological improvements made possible by such big data affect estimates of the unexplained gender pay gap. We study the sensitivity of the estimates with regard to i) the availability of observationally comparable men and women, ii) model flexibility when controlling for wage determinants, and iii) the choice of different parametric and semi-parametric estimators, including variants that make use of machine learning methods. We find that these three factors matter greatly. Blinder-Oaxaca estimates of the unexplained gender pay gap decline by up to 39% when we enforce comparability between men and women and use a more flexible specification of the wage equation. Semi-parametric matching yields estimates that when compared with the Blinder-Oaxaca estimates, are up to 50% smaller and also less sensitive to the way wage determinants are included.

Full PDF

TThe Gender Pay Gap Revisited with Big Data:Do Methodological Choices Matter? ∗ Anthony Strittmatter † Conny Wunsch ‡ February 22, 2021

Abstract

The vast majority of existing studies that estimate the average unexplained gender pay gapuse unnecessarily restrictive linear versions of the Blinder-Oaxaca decomposition. Using anotably rich and large data set of 1.7 million employees in Switzerland, we investigate howthe methodological improvements made possible by such big data aﬀect estimates of theunexplained gender pay gap. We study the sensitivity of the estimates with regard to i)the availability of observationally comparable men and women, ii) model ﬂexibility whencontrolling for wage determinants, and iii) the choice of diﬀerent parametric and semi-parametric estimators, including variants that make use of machine learning methods. Weﬁnd that these three factors matter greatly. Blinder-Oaxaca estimates of the unexplainedgender pay gap decline by up to 39% when we enforce comparability between men andwomen and use a more ﬂexible speciﬁcation of the wage equation. Semi-parametric match-ing yields estimates that when compared with the Blinder-Oaxaca estimates, are up to 50%smaller and also less sensitive to the way wage determinants are included.

Keywords:

Gender Inequality, Gender Pay Gap, Common Support, Model Speciﬁcation,Matching Estimator, Machine Learning.

JEL classiﬁcation:

J31, C21 ∗ We acknowledge helpful comments by Philipp Bach, Marina Bonaccolto-T¨opfer, Christina Felfe, MartinHuber, Pat Kline, Michael Knaus, Matthias Krapf, and Michael Lechner, seminar participants at the Universityof Basel, University of Linz, and LISER, Luxemburg, as well as conference participants at EALE/SOLE 2020,Verein f¨ur Socialpolitik 2020, and the IAB 2020 workshop on “Machine Learning in Labor, Education, and HealthEconomics”. Anthony Strittmatter gratefully acknowledges ﬁnancial support from the Swiss National ScienceFoundation (Spark Project 190422) and the French National Research Agency (LabEx Ecodec/ANR-11-LABX-0047). The authors are solely responsible for the analysis and the interpretation thereof. † CREST-ENSAE, Institut Polytechnique Paris, France; CESifo, Munich, Germany; email: [email protected]. ‡ Faculty of Business and Economics, University of Basel; University St. Gallen, Switzerland; email:[email protected]. a r X i v : . [ ec on . GN ] F e b Introduction

Achieving gender equality, especially equality in pay, is among the top priorities for governmentsin many countries. Measuring unexplained inequalities in pay between women and men, the so-called unexplained gender pay gap, has been the subject of an extensive literature for more thanhalf a century (see Blau and Kahn, 2000, 2017; Goldin and Mitchell, 2017; Kunze, 2018; Olivettiand Petrongolo, 2016, for comprehensive reviews). A large literature focuses on understandingthe sources of inequality in pay by studying the sensitivity of the gender pay gap with regard tothe inclusion of important wage determinants. A much smaller literature focuses on the impactof methodological choices. We contribute to the latter literature by investigating the sensitivityof the estimated gender pay gap with regard to three dimensions: enforcement of overlap inwage determinants across gender, ﬂexibility of model speciﬁcation, and estimation method. Ourresults suggest that these methodological choices can reduce the estimated unexplained genderpay gap by as much as 50%, even when keeping relevant wage determinants ﬁxed across estimates.This suggests that methodological choices are at least as relevant for gender pay gap estimationsas considerations about wage determinants.Weichselbaumer and Winter-Ebmer (2005) and Van der Velde et al. (2015) document thatthe vast majority of existing studies use a linear version of the Blinder-Oaxaca decomposition(BO, Blinder, 1973; Oaxaca, 1973) to estimate the mean unexplained gender pay gap (see Fortinet al., 2011, for a comprehensive review of BO and other decomposition methods). Thus, BOestimates serve as the key input for policy makers who aim to achieve equality in pay. However,most applications of BO impose a number of restrictions that may not be realistic. Firstly, theytypically use relatively inﬂexible functional forms for the wage equation. For example, the returnsto education are often assumed to be the same across occupations, age and experience, whichcontradicts both theory and empirical evidence (see, e.g., Lemieux, 2014). Secondly, standardapplications of BO do not account for common support violations, i.e., the lack of observationallycomparable men for every woman. BO extrapolates into regions without support based on theassumed functional form. If the functional form is misspeciﬁed and there is lack of support, thenthis extrapolation may lead to bias. Thirdly, BO restricts heterogeneity in gender pay gaps tovariable-speciﬁc diﬀerences in coeﬃcients. Any heterogeneity that is not captured by this maybias estimates of the mean gender pay gap. This is particularly relevant, because heterogeneityin gender pay gaps is widely acknowledged in the literature (see, e.g., Bach et al., 2018; Bar´onand Cobb-Clark, 2010; Bonjour and Gerﬁn, 2001; Chernozhukov et al., 2013, 2018a; Goldin,2014).Of course, these restrictions can easily be relaxed when working with large data sets. Re-searchers can model the inclusion of control variables in more ﬂexible ways. They can checkand enforce common support ex ante, even for parametric estimators like BO. Furthermore,1hey can choose more ﬂexible estimators that unlike BO do not restrict pay gap heterogene-ity. Surprisingly though, these adjustments are rarely implemented in applied work. Therefore,an important question we explore in this study is whether these adjustments really matter inpractice. To answer this question, we exploit a very large data set that covers about 37,000establishments with individual data on more than 1.7 million employees in Switzerland, whichcovers almost one third of all Swiss employees. The data allow us to take full advantage of themethodological improvements that are possible with existing methods. In particular, they allowus to go as far as exact matching on all elements of the rich set of observed wage determinantswith resulting cells that are still large enough for meaningful analysis after enforcing full support.We start by analyzing common support and the gender pay gap obtained from exact matchingapplying the technique of ˜Nopo (2008). Thereafter, we estimate the mean unexplained genderpay gap with diﬀerent parametric and semi-parametric methods and for various samples thatdiﬀer in how strictly we impose common support. As estimators, we consider a linear regressionmodel (LRM) with a dummy for women, BO, inverse probability weighting (IPW), augmentedIPW (AIPW) as a doubly robust mixture between BO and IPW, propensity score matching(PSM) and a combination of exact matching and PSM (EXPSM). For each estimator, we considerthree model speciﬁcations that diﬀer in how ﬂexibly we include the observed wage determinants:(i) the baseline model contains dummy variables for all categories and quadratic terms for thecontinuous variables (up to 117 control variables), (ii) the full model additionally contains higher-order polynomials as well as a large number of interactions between wage determinants (up to 615control variables), and (iii) the machine learning model employs LASSO estimation techniques(Tibshirani, 1996) to select the relevant wage determinants in a data-driven way (see Hastie etal., 2009, for an introduction to LASSO). Speciﬁcally, we implement the post-double-selectionprocedure (Belloni et al., 2013), double-machine-learning (Chernozhukov et al., 2018b), and T-learner (K¨unzel et al., 2019). Furthermore, we study the private and public sectors separately.The two sectors are subject to diﬀerent degrees of labor market regulation and they attractdiﬀerent types of workers, both which inﬂuence the size of observed gender pay gaps (see, e.g.,Bar´on and Cobb-Clark, 2010).With this study, we contribute to the existing literature in several important ways. Firstly,we conduct the ﬁrst comprehensive analysis of common support in the context of the gender paygap. Secondly, we are the ﬁrst to study how functional form restrictions regarding the inclusionof wage determinants aﬀect pay gap estimates both within and across diﬀerent estimators. Byincluding machine learning techniques for data-driven model speciﬁcation for all estimators,we also employ methods that have never before been considered in the context of the genderpay gap. Thirdly, we study a much more comprehensive set of estimators than any previousstudy. Finally, we are the ﬁrst to consider all of these dimensions collectively and to varythem systematically. In total, we estimate the unexplained gender pay gap for ﬁve deﬁnitions2f common support, six estimators, three model speciﬁcations, and two sectors, resulting in atotal of 180 diﬀerent estimates. This allows us to isolate the impact of each dimension and tounderstand their interactions.The literature that investigates the sensitivity of the gender pay gap with regard to com-mon support and estimation methods is small. ˜Nopo (2008) studies support violations and theresulting unexplained wage gaps with respect to four exemplary wage determinants with datafrom Peru. Moreover, he compares exact matching (EXM) with various linear speciﬁcations ofBO that diﬀer in how ﬂexibly three exemplary wage determinants are included, and whether ornot common support is enforced with respect to these variables. Black et al. (2008) and Gorauset al. (2017) compare BO and EXM estimates of the gender pay gap. Djurdjevic and Radyakin(2007), Fr¨olich (2007), and Meara et al. (2020) compare BO estimates of the gender pay gapwith diﬀerent propensity score matching estimators. Relatedly, Barsky et al. (2002) and Gra-ham et al. (2016) study the sensitivity of black-white gaps with regard to diﬀerent parametricand semi-parametric estimators. Bach et al. (2018) and Briel and T¨opfer (2020) apply machinelearning methods for model speciﬁcation using the post-double-selection procedure. However,none of these studies provides a comparable large-scale systematic analysis of gender pay gapestimates with regard to our considered methodological choices.Our paper is related to other strands of the literature that investigate the sensitivity of thegender pay gap with regard to other dimensions. One strand of literature focuses on the as-sumptions required for the identiﬁcation of gender discrimination. For example, there are studiesthat investigate biases due to gender-speciﬁc selection into employment (e.g., Chandrasekhar etal., 2019; Chernozhukov et al., 2020; Maasoumi and Wang, 2019; Machado, 2017; Neuman andOaxaca, 2004; Olivetti and Petrongolo, 2008) and potential endogeneity of the observed wagedeterminants (e.g., Kunze, 2008; Huber, 2015; Huber and Solovyeva, 2020; Yamaguchi, 2015).We maintain the same identifying assumptions for all our estimates. Thus, diﬀerences in ourestimates result from the empirical methods and not from the underlying assumptions. To beprecise, we measure the unexplained gender pay gap as the expected relative wage diﬀerenceof employed women in the sample with support, compared to employed men with the sameobserved wage determinants. This parameter is informative about equal pay for equal work tak-ing individual choices as a given. It does not necessarily measure gender discrimination in paybecause gender discrimination might inﬂuence eventual pay earlier in life through, for instance,educational and career choices. Moreover, there may be unobserved factors that help explainobserved gender diﬀerences in pay. A large strand of literature identiﬁes wage determinantsthat contribute to explaining the gender pay gap and analyzes how including these wage de-terminants changes estimates of the unexplained gender pay gap. We discuss this literature inSection 2.2 when we describe the variables we observe in our data. In our study, we keep thewage determinants for which we control constant across all estimates and only vary how ﬂexibly3e include them in the estimations.We ﬁnd that all the methodological choices we consider signiﬁcantly impact the size of theestimated unexplained gender pay gap. The estimates decline by up to 50% when stricter supportis enforced, functional forms are relaxed, and less restrictive estimators are used. The lack ofcomparable men for each woman is a particularly serious issue. For 89% and 70% of employedwomen in the private and public sector, respectively, there is no support with regard to at leastone observed wage determinant. For the public sector, we ﬁnd that lack of support directlyexplains a large part of the raw gender pay gap. Moreover, with estimates that are 6-50% lower,enforcing support strongly aﬀects the estimates of the unexplained gaps for all estimators. Thus,checking support in applications ex ante and deciding on how strictly it should be enforced iscrucial to applied work.With estimates that are around 5-19% lower, the ﬂexible inclusion of wage determinants isvery important for the parametric estimators LRM and BO as well as the hybrid AIPW. Incontrast, model speciﬁcation has little impact on the results from semi-parametric matchingestimators that do not model the wage equation. These estimators do not restrict heterogeneityin unexplained wage gaps, which we ﬁnd to be important as well. Compared to the most ﬂexibleversion of BO and with a reasonable choice of common support, ﬂexible EXPSM reduces theestimated unexplained wage gap by another 14% in the private sector and 35% in the publicsector. The results for semi-parametric IPW are ambiguous, which might be related to propensityscore values close to one (e.g., Khan and Tamer, 2010; Busso et al., 2014).Based on our ﬁndings, we recommend enforcing common support with respect to the mostimportant wage determinants ex ante. To estimate the unexplained pay gap, we recommendcombining exact matching on some important wage determinants with radius matching on aﬂexibly speciﬁed propensity score. This minimizes the risk of functional form misspeciﬁcationand oﬀers a reasonable balance of comparability, precision of the estimate, and representative-ness of the study sample. For the private and public sector, respectively, implementing theserecommendations with our data explains 16% and 43% more of the raw wage gap than standardBO estimates and results in estimated unexplained pay gaps that are 23% and 50% lower. Animportant takeaway for policy makers is that the commonly reported BO estimates of the genderpay gap can be misleading.The paper proceeds as follows. The next section describes the data we use. In Section 3, weexplain our empirical model and estimation methods. Section 4 contains the results, startingwith the analysis of common support, then followed by the results for the mean unexplainedgender pay gap as a function of the three dimensions of methodological choices we consider. InSection 5, we discuss the generalizability of our results and provide recommendations for appliedwork. The last section concludes. Online appendices A-D contain supplementary material.4

Data

We use the 2016 wave of the Swiss Earnings Structure Survey (ESS). The ESS is a bi-annualsurvey of approximately 37,000 private and public establishments with individual data on morethan 1.7 million employees, representing almost one third of all employees in Switzerland. Thesurvey covers salaried jobs in the secondary and tertiary sectors in establishments with at leastthree employees. Sampling is random within strata deﬁned by establishment size, industry, andgeographic location. Participation in the survey is compulsory for the establishments. Thegross response rate is higher than 80%. All results we report take into account the samplingweights provided by the ESS that correct for both stratiﬁcation, and non-response. Typically,establishments report the required information directly from their remuneration systems. Thus,the survey eﬀectively includes administrative data from establishments.We restrict the analysis to the working population aged between 20 and 59 years (dropping127,298 employees). We exclude employees for whom we observe very little support betweenmen and women ex ante. This excludes 70,052 employees with less than 20 percent part-timeemployment, 2,706 members of the armed forces or agricultural and forestry occupations, and3,025 employees from the agricultural, forestry, mining, and tobacco sectors. We analyze thegender pay gap separately for the private and public sector. The public sector oﬀers moreregulated wages and attracts a diﬀerent selection of employees. For example, women are over-represented in the public at 56%, but constitute only 43% of the private sector. Moreover, thepublic sector is much more homogeneous in terms of industries and occupations. The baselinesample contains 1,132,042 employees in the private sector and 405,448 employees in the publicsector, after dropping an additional 54 employees in industries with very few observations in thepublic sector. The main variable of interest is a standardized wage measure that is provided by the FederalStatistical Oﬃce as part of the ESS. It measures the monthly full-time-equivalent gross wageincluding extra payments. The latter comprise add-ons for shift, Sunday, and night work, othernon-standard working conditions, and irregular payments such as bonuses and Christmas orholiday salaries, but they exclude overtime premia. Wages are standardized to a 100% full-timeequivalent without overtime hours. Based on 2-digit ISCO-08 codes 01, 02, 03, 61, and 62. Based on 2-digit NOGA 2008 codes 01, 02, 05, and 12. Based on 2-digit NOGA 2008 codes 10, 31, 45, 47, 55, and 58.

Private Sector Public SectorMean Std. Mean Std.Women Men Diﬀ. Women Men Diﬀ.(1) (2) (3) (4) (5) (6)Remuneration characteristicsStandardized monthly wage (in CHF) 6,266 7,793 31.9 7,731 8,985 42.2Irregular payments .330 .411 16.9 .138 .227 23.3Demographic characteristicsAge20-29 years .206 .186 5.1 .159 .116 12.530-39 years .270 .285 3.4 .266 .250 3.740-49 years .275 .282 1.5 .282 .300 3.950-59 years .249 .248 .4 .293 .335 8.9EducationHigher .284 .320 7.9 .539 .570 6.2Vocational .474 .466 1.7 .279 .288 2.1No vocational .187 .168 4.9 .074 .048 11.0Employment characteristicsTenure < <

25% .044 .297 71.4 .028 .107 32.025-50% .381 .518 27.7 .296 .440 30.350-75% .346 .158 44.3 .395 .278 24.9 >

75% .229 .027 63.2 .214 .131 22.2Part-time and full-time work20-49% .195 .038 50.6 .187 .059 39.850-79% .235 .046 56.4 .303 .086 57.180-99% .164 .066 31.0 .221 .124 25.9100% .406 .850 103.4 .290 .732 98.6Establishment size <

20 Employees .255 .215 9.5 .015 .017 1.420-49 Employees .131 .162 8.9 .029 .027 1.150-249 Employees .237 .258 5.0 .123 .091 10.3250-999 Employees .142 .157 4.4 .117 .108 3.0 >

999 Employees .236 .207 6.9 .716 .756 9.4Observations 491,007 641,035 227,617 177,831Notes: Table is based on the baseline sample before imposing sample restriction on support. The monthly regularwage is reported by employers. It is standardized to 100% full-time equivalent wages without overtime hours.Table A.1 in the Online Appendix A provides a detailed description of all observed variables. Rosenbaum andRubin (1983) classify absolute standardized diﬀerence (std. diﬀ.) of more than 20 as “large”. Table 1 documentsthe means and standardized diﬀerences of selected variables by sector and gender (see Table A.1in Online Appendix A for a full list of all observed characteristics). In the private sector, theaverage standardized monthly wage is 6,266 CHF for women and 7,793 CHF for men. In thepublic sector, the average wage is much higher at 7,731 CHF for women and 8,985 CHF for men.In both sectors, the share of women receiving irregular payments such as bonuses is notablysmaller than that of men. This aligns with a similar diﬀerence in the share of employees holdinga management position.The distributions of age and education are relatively similar across gender in both sectors,although employed women tend to have slightly less education than men. Women are somewhatless likely to have long tenure and signiﬁcantly less likely to work full-time than men. Labormarket segregation by occupations is very strong. We illustrate this by showing the employmentshares for occupations grouped by their shares of women. Unsurprisingly, more women workin female-dominated occupations than men and vice versa. Female-dominated occupations aremore frequent in the public sector. Less than 3% of men in the private sector, but more than 13%of men in the public sector work in female-dominated occupations. Gender diﬀerences accordingto the establishments size show no strong systematic pattern.While the data are quite rich with regard to available wage determinants, there are severalimportant variables that we do not observe. One example is actual work experience (e.g., Cooket al., 2020; Gayle and Golan, 2012). Potential experience is captured by age and education,and we also observe tenure. Hence, we capture some aspects of experience, but not all of it.Since experience is positively related to wages, and women have on average less work experi-ence, we over-estimate potential violations of equal pay for equal work in Switzerland in thisrespect. However, other information that the literature emphasizes is missing as well. Thisincludes, for example, competitiveness, children, environment during childhood, gender norms,and non-cognitive factors. Moreover, we account for neither selection into employment basedon unobserved characteristics, nor for potential endogenity of control variables. Hence, our es- See Bayard et al. (2003), Beaudry and Lewis (2014), Bertrand et al. (2019), Bertrand et al. (2010); Bertrandand Hallock (2001), Brenzel et al. (2014), Bruns (2019), Buﬃngton et al. (2016), Card et al. (2016), Fern´andezand Wong (2014), Gobillon et al. (2015), Goldin et al. (2017), Heinze and Wolf (2010), Liu (2016), Manning andPetrongolo (2008), Oberﬁchtner et al. (2020), Sin et al. (2020), and Winter-Ebmer and Zweim¨uller (1997) amongothers. The conversion rate of CHF to US dollars was approximately 1:1 in 2016. See, e.g., Flory et al. (2015) and Gneezy et al. (2009, 2003) for competitiveness, Adda et al. (2017), Andersonet al. (2002), Angelov et al. (2016), Bailey et al. (2012), B¨utikofer et al. (2018), Ejrnæs and Kunze (2013), Fitzen-berger et al. (2016), Kleven et al. (2019a,b), Krapf et al. (2020), Lundborg et al. (2017), and Waldfogel (1998)for children, Autor et al. (2019), Bertrand and Pan (2013), and Brenøe and Lundberg (2018) for environmentduring childhood, Bertrand et al. (2015) and Roth and Slotwinski (2018) for gender norms, and Fortin (2008)for non-cognitive factors.

We denote the gender dummy by G i , with G i = 1 for employed women and G i = 0 for employedmen. We use the logarithm of the standardized monthly wage as the outcome variable, whichwe denote by Y i . The raw gender pay gap is∆ = E [ Y i | G i = 1] − E [ Y i | G i = 0] . (1)The raw gender pay gap can be decomposed into an explained and unexplained part. Thevector X i contains observed demographic and labor market characteristics of the employeesas well as observed characteristics of their employers. The predicted wage of employed men,would they have the same observed characteristics as employed women is E X | G =1 [ µ ( x )], with µ ( x ) = E [ Y i | G i = 0 , X i = x ]. Adding and subtracting E X | G =1 [ µ ( x )] in (1) gives∆ = E [ Y i | G i = 1] − E X | G =1 [ µ ( x )] (cid:124) (cid:123)(cid:122) (cid:125) unexplained δ + E X | G =1 [ µ ( x )] − E [ Y i | G i = 0] (cid:124) (cid:123)(cid:122) (cid:125) explained η . (2)The second diﬀerence on the right side of (2) is the part of the raw gender pay gap that canbe explained by gender diﬀerences in the observed wage determinants X i . The ﬁrst diﬀerenceon the right side of (2) is the gender pay gap for employed women that cannot be explainedby gender diﬀerences in the observed wage determinants. It is the expected diﬀerence in pay ofemployed women compared to observationally identical employed men, which we denote by δ = E [ Y i | G i = 1] − E X | G =1 [ µ ( x )] . (3)This is the parameter of interest in the majority of studies on gender wage inequality. The unexplained pay gap (3) is only identiﬁed if, for all females, there exist men that areobservationally identical with respect to the wage determinants. Now assume that there is lackof support for some men or women. Let S i = 1 for individuals with support and S i = 0 for8ndividuals without support. ˜Nopo (2008) shows that∆ = E [ Y i | G i = 1 , S i = 1] − E [ Y i | G i = 0 , S i = 1] (cid:124) (cid:123)(cid:122) (cid:125) =∆ S =1 +∆ sG =1 − ∆ sG =0 where ∆ sG = g ≡ P r ( S i = 0 | G i = g )[ E [ Y i | G i = g, S i = 0] − E [ Y i | G i = g, S i = 1]] measures wagediﬀerences across individuals of gender g (for g ∈ { , } ) in and out of support. The parameter∆ S =1 is the raw wage gap within support, which we can decompose as∆ S =1 = E [ Y i | G i = 1 , S i = 1] − E X | G =1 ,S =1 [ µ ( x ) | S i = 1] (cid:124) (cid:123)(cid:122) (cid:125) δ S =1 + E X | G =1 ,S =1 [ µ ( x ) | S i = 1] − E [ Y i | G i = 0 , S i = 1] (cid:124) (cid:123)(cid:122) (cid:125) η S =1 . (4)The ﬁrst right-hand term, δ S =1 , is the unexplained gender pay gap for employed women withsupport. The second right-hand term, η S =1 , is the part of the raw gender pay gap with supportthat can be explained by gender diﬀerences in the wage determinants.We use the following procedure to analyze the inﬂuence of common support violations ongender pay gap estimates. First, we sequentially increase the number of wage determinantsfor which we impose common support and then study the share of females without support.Imposing common support based on a large number of wage determinants increases ex-antecomparability of women and men. Second, we estimate the elements of (4) for each sequentialstep using an exact matching estimator (see Section 3.3.4 for more details). Third, based on thesequential analysis, we pick ﬁve exemplary deﬁnitions of support and estimate the unexplainedgender pay gap on support, δ S =1 . Speciﬁcally, we restrict the sample to the employees on supportand implement diﬀerent estimators for the unexplained gender pay gap, which we explain in thenext section. There exists a very large set of possible parametric and semi-parametric estimators, doubly-robust mixtures between the two types of estimators, and non-parametric estimators. We focuson common estimators that are easy to implement with the objective of showing the main trade-oﬀs in estimator choice. We apply each estimator with ﬁve diﬀerent support conditions wherewe restrict the sample to the observations satisfying the respective support deﬁnition.9 .3.1 Dummy regression (LRM)

A simple estimation approach for the unexplained gender pay gap employs a linear regressionmodel (LRM) with a dummy for women, Y i = α + G i δ LRM + X i β + ε i , (5)where α is a constant, the vector β describes the association between the wage determinants X i and the wages, and ε i is an error term. The parameter δ LRM can be interpreted as theunexplained gender pay gap.The LRM imposes the restriction that the coeﬃcients of the wage determinants as wellas the unexplained gender pay gap are homogeneous across all individuals. This includes theassumption that the unexplained gender pay gap is equal for women and men. This modellingassumption is unnecessarily restrictive, because it can be relaxed by including interaction termsbetween gender and the wage determinants, which we discuss in the next section in the contextof the BO decomposition.

The BO decomposition is a two-step estimator that allows for heterogeneous unexplained genderpay gaps (Blinder, 1973; Oaxaca, 1973). In the ﬁrst step, we estimate in the subsample of menthe linear wage model Y i = α + X i β + u i , (6)where α is an intercept and u i is an error term. The coeﬃcients β describe the associationbetween the wage determinants X i and the wages of men. In the second step, we use theestimated coeﬃcients from this regression to predict the counterfactual male wage ˆ µ ( X i ) ≡ ˆ α + X i ˆ β for each woman in the sample and estimate the mean unexplained gender pay gap forwomen using ˆ δ BO = 1 N N (cid:88) i =1 G i ( Y i − ˆ µ ( X i )) (7)with N = (cid:80) Ni =1 G i (hats indicate estimated parameters). Note that this estimation procedureis numerically identical to using a linear prediction for female wages, Y i = α + X i β + v i , in thesample of women and calculating the unexplained part of the pay gap asˆ δ BO = ( ˆ α − ˆ α ) + 1 N N (cid:88) i =1 G i X i ( ˆ β − ˆ β ) , N N (cid:88) i =1 G i Y i = ˆ α + 1 N N (cid:88) i =1 G i X i ˆ β .In contrast to the LRM, this version of the BO decomposition allows for gender diﬀerences inthe impact of characteristics X i on wages as well as for heterogeneity in the unexplained genderpay gap that is driven by these diﬀerences.The BO model corresponds to the LRM augmented with interaction terms between genderand all observable wage determinants: Y i = α + G i ( α − α ) (cid:124) (cid:123)(cid:122) (cid:125) = α BO + X i β + G i X i ( β − β ) (cid:124) (cid:123)(cid:122) (cid:125) = β BO + (cid:15) i . (8)Using this fully interacted LRM, we could estimate the BO unexplained wage gap byˆ δ BO = ˆ α BO + 1 N N (cid:88) i =1 G i X i ˆ β BO . This estimation procedure is numerically identical to (7) when we control for the same charac-teristics X i as in (6). Inverse probability weighting (IPW) estimators are also two-step estimators (Hirano et al., 2003;Horvitz and Thompson, 1952). In the ﬁrst step, we estimate the conditional probability of beinga women, the so-called propensity score, based on the model p ( X i ) = P r ( G i = 1 | X i ) = F ( X i γ ) , where F ( · ) is a binary link function (e.g., the logistic distribution). In the second step, wepredict ˆ p ( X i ) for all observations and estimate the re-weighted sample averageˆ δ IP W = 1 N N (cid:88) i =1 G i Y i − N (cid:88) i =1 ˆ W i Y i , with the IPW weights ˆ W i = (1 − G i )ˆ p ( X i )1 − ˆ p ( X i ) (cid:46) N (cid:88) i =1 (1 − G i )ˆ p ( X i )1 − ˆ p ( X i ) (9)The denominator in (9) guarantees that the IPW weights add up to one in ﬁnite samples (see, e.g.,Busso et al., 2014). In contrast to LRM and BO, IPW does not impose any speciﬁc functionalform on the relationship between X i and the wage, and it does not restrict heterogeneity in11nexplained gender pay gaps. A potential disadvantage of IPW is its instability when theconditional probability of being a women is close to one (e.g., Khan and Tamer, 2010), whichmay results in high variance of ˆ δ IP W . We impose trimming rules to avoid propensity score valuesthat are too extreme (see the discussion in, e.g., Lechner and Strittmatter, 2019). In particular,we omit males with large weights ˆ p ( X i ) / (1 − ˆ p ( X i )) above the 99.5% quantile. We documentthe number of trimmed observations in Table B.1 of Online Appendix B.Kline (2011) argues that BO estimators are equivalent to propensity score re-weighting es-timators, but in contrast to IPW they model the propensity score linearly. . As a result, theimplicit weights of BO can become negative, which is not possible for the IPW estimator. How-ever, in the limit to a fully saturated BO model, even the linear speciﬁcation of the propensityscore would be well behaved and negative weights would be unlikely.Augmented inverse probability weighting (AIPW) dates back to Robins et al. (1994) andhas received signiﬁcant attention since Chernozhukov et al. (2017) proposed this estimationprocedure in the context of machine learning. AIPW is a doubly-robust mixture between theBO and IPW approach,ˆ δ AIP W = 1 N N (cid:88) i =1 G i ( Y i − ˆ µ ( X i )) − N (cid:88) i =1 ˆ W i ( Y i − ˆ µ ( X i )) , (10)with ˆ µ ( X i ) as in BO. The ﬁrst right-hand term in (10) is equivalent to the BO in (7). The secondright-hand term in (10) has an expected value of zero, but makes a ﬁnite sample adjustment byre-weighting the observable bias of ˆ µ ( X i ) in the sample of men with the IPW weights.AIPW is more robust to misspeciﬁcation than BO or IPW. In particular, ˆ δ AIP W is consistenteven when either ˆ µ ( x ) or ˆ p ( x ) is misspeciﬁed. Moreover, the theoretical properties of AIPWare well established for generic estimators of the nuisance parameters ˆ µ ( x ) and ˆ p ( x ). In par-ticular, √ N -consistency of ˆ µ ( x ) and ˆ p ( x ) is suﬃcient to achieve √ N -consistency of ˆ δ AIP W (incombination with the cross-ﬁtting procedure described below). This permits the application ofﬂexible machine learning methods to estimate ˆ µ ( x ) and ˆ p ( x ), which often have a convergencerate below √ N . AIPW combined with machine learning is often called double-machine-learning(see Knaus, 2020, for a review). Exact matching (EXM) as used, for example by ˜Nopo (2008), is a fully non-parametric estimationapproach. We stratify the sample into K (cid:28) N mutually exclusive groups W i ∈ { , ..., K } , which We apply the normalization described in (9) after the trimming. The usual practice of IPW estimators is to use a Logit or Probit estimator for the propensity score X i . The EXM estimator isˆ δ EXM = 1 N N (cid:88) i =1 G i  Y i − N (cid:88) j =1 { W i = W j } (1 − G j ) Y jN (cid:88) j =1 { W i = W j } (1 − G j )  . EXM is more ﬂexible than LRM, BO, IPW and AIPW. However, it suﬀers from the curse ofdimensionality. When we consider many strata K , the risk of lacking support (i.e., ﬁnding nomen matching speciﬁc strata of women) increases. Unlike the previous estimators, EXM cannotextrapolate into regions without support. Therefore, the estimation breaks down in the presenceof support violations. Furthermore, even when there is support, there may be strata with veryfew men, which might lead to high variance of ˆ δ EXM . Despite these potential disadvantages,our large data set with more than one million observations remains suitable for EXM. However,we have to deﬁne relatively coarse groups for the two continuous variables of age and tenure toavoid empty strata. For the same reason, we have to combine some industries and occupationswith only few observations into more loosely similar groups. We implement EXM as in ˜Nopo(2008); i.e., we exactly match the stratiﬁed variables that deﬁne common support. This impliesthat the versions with less restrictive support deﬁnitions also match on fewer variables, thusremoving fewer diﬀerences in observed wage determinants. In the private [public] sector, wehave 140 [142] strata for the least restrictive support deﬁnition and 844,100 [208,860] strata forthe most restrictive one.We consider two additional semi-parametric matching estimators to account for potentialconcerns regarding the curse of dimensionality and unmatched wage determinants within strata.First, we use propensity score radius matching (PSM) (see e.g. Fr¨olich, 2007; Lechner and Wun-sch, 2009; Lechner et al., 2011). The propensity score p ( X i ) is the same as for IPW. PSM is aone-dimensional matching approach (as opposed to the multi-dimensional matching of EXM).To each woman, we match the men who have a propensity score value within a certain absolutediﬀerence (radius) of the woman’s propensity score value. We deﬁne the radius as the 99%quantile of the distribution of the closest absolute distances for all women, then omit womenwithout a match within the radius. Table B.2 in Online Appendix B documents the number offemales we use for the diﬀerent matching estimators. We lose only a relatively few women wholack matching men within the deﬁned radius.Second, we consider mixtures between exact and propensity score radius matching (EXPSM).Here, we exactly match on the wage determinants that deﬁne support, as in EXM, and applypropensity score radius matching within each stratum using all wage determinants, as in PSM.In contrast to EXM, the propensity score corrects semi-parametrically for remaining within-13trata observable diﬀerences between women and men. In contrast to PSM, we enforce exactcomparability of women and men with respect to some wage determinants. To estimate the so-called nuisance parameters, i.e., the wage equation ˆ µ ( x ) and the propensityscore ˆ p ( x ), we consider three diﬀerent models. We call them the baseline, full, and machinelearning (ML) models. The baseline and full models include the same wage determinants, but wevary the ﬂexibility in terms of non-linear (e.g., polynomials, categorical dummies) and interactionterms of the wage determinants. The ML model can select the relevant wage determinants andthe model ﬂexibility in a data-driven way, using the full model speciﬁcation as input. In practice,it is often unclear which non-linear and interaction terms are relevant. Using a too parsimoniousmodel could bias δ , due to model misspeciﬁcation. Allowing for too much ﬂexibility could leadto imprecise estimates of δ with high variance, especially in smaller samples. Accordingly, weface a bias-variance trade-oﬀ. In the baseline model, we control for all observed variables in a standard way. Speciﬁcally, we in-clude age (linear and squared), tenure (linear and squared), vocational education (9 categories),citizen status (6 categories), marital status (3 categories), occupation (39 [20] categories in theprivate [public] sector), industry sector (36 [12] categories in the private [public] sector), man-agement position (5 categories [and a missing dummy for public sector]), region (7 categories),establishment size (5 categories), and employment share as a fraction of full time (4 categories).Furthermore, we control for the dummy variables of temporary employment, employment con-tract with hourly wages, collective wage agreement, overtime hours payment, bonus payments(e.g., from proﬁt sharing), supplementary wages (e.g., for shift work), and extra salary (e.g.Christmas and holiday salaries). Overall, the baseline model includes 117 control variables inthe speciﬁcation for the private sector and 74 control variables in the speciﬁcation for the publicsector.In the full model, we add to the baseline model all non-linear and interaction terms thatwe think could potentially be relevant. The non-linear terms are the polynomials of age andtenure up to order seven, as well as four age and ﬁve tenure categories. Furthermore, we includeinteraction terms between control variables to allow for heterogeneous returns to important wagedeterminants. Speciﬁcally, we interact occupation and industry groups with the categorical It should be noted that we implement PSM after matching exactly on the three variables that deﬁne the leastrestrictive Support 1, because this improves matching without lack of support. As a result, PSM and EXPSMare identical for Support 1 but not for the other support deﬁnitions. We partially coarsen the categorical variables to improve support of the interaction terms.

In contrast to manual model speciﬁcation, machine learning estimators have the potential tobalance the bias-variance trade-oﬀ in a data-driven way. We use the LASSO to specify our MLmodels. The LASSO is a penalized regression method (see Hastie et al., 2009, for a detailed de-scription of the LASSO). It can balance the bias-variance trade-oﬀ by shrinking some coeﬃcientstowards zero. Eventually, some covariates are excluded from the model when the coeﬃcientsare exactly zero. Accordingly, the LASSO is a model selection device. To obtain the predictedwages of men ˆ µ ( x ) in the baseline and full models for BO, we estimate in the sample of menthe OLS model ( ˆ α , ˆ β ) = arg min a,b N N (cid:88) i =1 ( Y i − a − X i b ) , and predict the conditional male wages by ˆ µ ( x ) = ˆ α + x ˆ β . In contrast, the LASSO objectivefunction is ( ˆ α , ˆ β ) = arg min a,b N N (cid:88) i =1 ( Y i − a − X i b ) + λ (cid:107) b (cid:107) , (11)with the penalty term λ (cid:107) b (cid:107) = λ (cid:80) Pp =1 | b | . The tuning parameter λ ≥ β . If λ = 0, the ML model is equivalent tothe full model. If λ >

0, some coeﬃcients ˆ β shrink towards zero and eventually approachzero exactly. Shrinking coeﬃcients to zero is equivalent to excluding the corresponding controlvariable in X i from the wage equation. In the extreme case, when λ → ∞ , all coeﬃcients in ˆ β are exactly zero and only the intercept ˆ α is non-zero, because it is not penalized. We determinethe tuning parameter λ with a 5-fold cross-validation procedure (Chetverikov et al., 2017). The cross-validation procedure selects the λ that minimises the mean-squared-error (MSE). Tosmooth the ML model, we apply the one-standard-error rule to the minimum λ (see, e.g., Hastieet al., 2009).We use similar models to estimate the propensity score ˆ p ( x ). However, instead of using OLS,we use the Logit estimator for the baseline and full model. Likewise, we use Logit-LASSO instead Alternatively, the choice of the tuning parameter λ can be based on information criteria (Zou et al., 2007)or data-driven procedures (Belloni et al., 2012).

15f the standard LASSO approach for the ML model (see Hastie et al., 2016, for an introductionto Logit-LASSO). In Figures C.1-C.8 in Online Appendix C, we report the density of theestimated propensity scores by gender.

Post-double-selection (PDS) procedure.

The LASSO version of the LRM correspondsto the PDS procedure of Belloni et al. (2013, 2014). The conventional LASSO has the purpose ofpredicting Y i , but it is not designed to obtain an unbiased estimate of the structural parameter δ LRM . To illustrate this, the application of LASSO to (5) without penalizing the gender dummywould most likely lead to the omission of those wage determinants that are highly correlated withgender, such that an omitted variable bias is inevitable. The PDS procedure enables LASSOto select the relevant wage determinants in a data-driven way, without the threat of omittingimportant variables. The PDS procedure has three steps. First, the LASSO selects the relevantcontrol variables in the linear wage equation Y i = α Y + X i β Y + ν i , which does not contain thethe gender dummy G i . In practice, the LASSO can set some coeﬃcients in the vector β Y tozero, which is equivalent to excluding the corresponding control variables in X i from the wageequation. Second, the LASSO selects the relevant dimensions of X i in the linear gender model G i = α G + X i β G + η i . As before, the LASSO estimator can set some coeﬃcients in the vector β G to zero, such that only those control variables with a strongly gender correlation remain inthe model. We denote the union of all wage determinants with non-zero coeﬃcients in either β Y or β G by ˜ X i . Finally, we estimate the linear OLS model (called ’post-LASSO’) Y i = α P DS + G i δ P DS + ˜ X i β P DS + ˜ ε i . Belloni et al. (2013) show that the estimator of δ P DS is consistent and asymptotically normal.The PDS procedure allows for the inclusion of high-dimensional characteristics X i (i.e., morecharacteristics than observations), but requires ˜ X i to be sparse. Some approximation error in theselection of the ˜ X i variables is permitted. The PDS procedure is doubly robust to misspeciﬁcationof either the earnings equation or the gender model.For the same reason as for the LRM, applying LASSO in a naive way to the interacted BOin (8) could lead to biased estimates of α BO and β BO , which could also bias ˆ δ BO . However,we can use the LASSO to estimate ˆ µ ( x ), since ˆ µ ( x ) is a pure prediction model for Y i , andthen use ˆ µ ( x ) as a plug-in estimator in (7). Accordingly, (7) provides a way to combine theBlinder-Oaxaca decomposition with standard machine learning methods. This procedure iscalled T-learner in the denomination of K¨unzel et al. (2019). A possible disadvantage of thisapproach is that we do not obtain estimates of the parameters α BO and β BO , since they arenot required to estimate δ BO . Alternatively, Bach et al. (2018) propose a procedure that is The only exception is the post-double-selection procedure, for which we use a linear model to estimate thepropensity score, as proposed in Belloni et al. (2013). In contrast to (11), we use the sample of men and women for the wage equation of the PDS procedure. α BO and β BO . Cross-ﬁtting.

To avoid over-ﬁtting and obtain √ N convergence for AIPW, it is necessary toestimate the ML models of ˆ µ ( x ) and ˆ p ( x ) in a diﬀerent sample than ˆ δ AIP W (see Chernozhukovet al., 2018b). We achieve this by using cross-ﬁtting. We partition the sample into two parts ofequal size. We estimate ˆ µ ( x ) and ˆ p ( x ) with the ﬁrst partition, extrapolate their ﬁtted valuesto the second partition and use these values to estimate ˆ δ AIP W . Thereafter, we switch the ﬁrstand second partition and repeat the procedure, such that the entire data set is used eﬃciently.We report the average ˆ δ AIP W across both partitions.

Selected variables.

We allow the LASSO to select among all control variables of the fullmodel. Table C.1 in Online Appendix C documents the number of selected control variables forall ML models we consider. The ﬁnal speciﬁcations vary by support and estimation procedure.For the wage equation, the ML model considers between 371 and 513 control variables in thediﬀerent speciﬁcations for the private sector and between 215 and 306 control variables for thepublic sector. For the propensity score model, the ML model considers between 141 and 428control variables for the private sector and between 126 and 230 control variables for the publicsector.

Performance.

In Tables C.2 and C.3 in Online Appendix C, we report the out-of-sampleprediction power of the diﬀerent nuisance parameter models. To obtain the out-of-sample predic-tion power, we use a cross-ﬁtting procedure. The prediction power of the baseline, full, and MLmodels does not diﬀer strongly. Accordingly, the baseline model already explains a signiﬁcantamount of the variation in the data. The prediction power of the full model is systematicallybetter than the prediction power of the baseline model, but the additional gain is moderate. TheML model cannot systematically outperform the full model. The prediction power of the fulland ML models is fairly similar, even though the ML model controls for many fewer variablesthan the full model. This suggests that the full model is too ﬂexible. However, because of ourlarge data set, the prediction power of the full model does not deteriorate compared to the MLmodel. The degrees of freedom loss of the full compared to the ML model appear to be of minorimportance. However, we expect that the ML model would outperform the full model in smallersamples. We partition the data in two equally sized samples, using one partition as a training sample and the otheras a test sample. Thereafter, we switch the two partitions and report the average prediction power across thetwo samples. Results

Our ﬁrst step is to analyze common support using the approach of ˜Nopo (2008). We sequentiallyincrease the number of wage determinants for which we impose common support. To determinethe order in which we add variables, we run a simple wage regression in the sample of men, i.e. weestimate µ ( x ) in order to determine the importance of the diﬀerent variables for explaining men’swages. We measure importance as the change in adjusted R when we omit one block of variables,e.g. all occupation dummies, but keep all other wage determinants. We do this separately forthe private and public sectors and sort all variable blocks in decreasing order according to theaverage R changes across both sectors. Thus, we start with the most important variable blockfor explaining male wages and ﬁnish with the least important. We report the resulting orderof variables and the sector-speciﬁc changes in the adjusted R in Tables B.3 and B.4 of OnlineAppendix B for the private and public sector, respectively.Our choice of ordering based on importance for explaining male wages is motivated by thecurse of dimensionality. The larger the number of variables with respect to which we enforcecommon support, the more likely it is that there will be empty cells. As a result, we expectthe share of the original sample for which there is support to decrease as we add more andmore variables. This implies that there is some diﬃculty in achieving a reasonable balance ofcomparability, sample size and representativeness of the remaining sample. Therefore, we wantto impose support with respect to variables that signiﬁcantly inﬂuence the counterfactual malewage, but not necessarily with respect to variables of minor importance.In Figure 1, we report how the share of women with support falls as we increase the numberof wage determinants with respect to which we enforce support. There is full support withrespect to the three most important determinants – management position, education and age– in the private sector, and only 2 women are oﬀ support in the public sector. The loss ofobservations due to lack of support is relatively small when we add industry and occupation,which are both highly relevant for predicting wages. It becomes larger but remains below 20%if we add establishment size and a dummy for irregular payments. Thereafter, however, supportdeteriorates quickly. Once we enforce support with respect to the 10 variables that are mostimportant for explaining male wages, 55% of women in the private sector and 32% of women inthe public sector have no comparable men with respect to these variables. The loss of women isstronger in the private sector than in the public sector because the latter is more homogeneous interms of occupations and industries. Subsequently adding the less important wage determinantsfurther reduces support, resulting in lack of support for 89% of women in the private sector and70% of women in the public sector when we enforce full support with respect to all observedwage determinants. 18igure 1: Analysis of common support by sector (a) Private sector(b) Public sector Notes: The indicated wage determinants are added sequentially from left to right. The raw gap is the diﬀerencebetween the average log wages of women and men in the sample with full support with respect to the indicatedvariables. The unexplained gap is based on the counterfactual male wage obtained from exact matching on allwage determinants from the left to the respective determinant within the sample that has full support with respectto these variables. The numbers in parentheses indicate the ﬁve support deﬁnitions we use for the estimations.

19f course, diﬀerent orders of variables lead to diﬀerent evolutions of support. Table B.5 inOnline Appendix B shows how support changes when we add these variables in three alternativeorders. The ﬁrst one uses sector-speciﬁc R changes rather than the average over both sectorsto order variables. For the main analysis, we want to keep the results comparable across sectors,which is why we want the same order for both sectors. The order resulting from sector-speciﬁc R changes diﬀers across sectors for some variables, but the overall diﬀerences are moderate. Ac-cordingly, the evolution of support when adding variables is also similar. The second alternativeis a random order. Here support remains higher for a larger number of added variables, mainlybecause critical variables such as industry and tenure are, by chance, added relatively late. Thiswould change with a diﬀerent random order in which the are added earlier. The last alternativeuses an increasing order according to the sector-average R changes as the other extreme toa decreasing order. Here, support breaks down rather late. The reasons for this are twofold.First, the ordering prioritizes wage determinants in which women and men do not greatly diﬀer.Second, many of these wage determinants are dummies while more important variables (suchas education, occupation and industry) have many diﬀerent categories. This mechanically splitsthe sample in more cells than a dummy variable such that cell size is reduced.Besides studying the evolution of support, we also estimate the elements of (4). The rawgender pay gaps on support, ∆ S =1 , can be obtained from the average gender diﬀerence in wagesin the sample with support. To estimate the unexplained gender pay gap for the women withsupport, δ S =1 , we follow ˜Nopo (2008) and calculate the pay gap that remains after matchingexactly on all variables for which we enforce support. Thus, when adding support-relevantwage determinants, we also increase the number of variables on which we exactly match. Theexplained gender pay gap on support, η S =1 , can be calculated as the diﬀerence between the rawand unexplained gender pay gap within support. We report the main insights from this analysisin Figure 1 and we document the full set of results in Tables B.3 and B.4 in Online AppendixB. The raw gender pay gap is relatively insensitive to the support deﬁnition in the privatesector. It decreases only slightly, from 18.6% under the weakest support deﬁnition to 17.3%under the strongest one. In contrast, the EXM estimates of the unexplained and explainedgender pay gaps are very sensitive to the support deﬁnition. The unexplained gender pay gapon support falls from 13.4% under the weakest support deﬁnition to 4.2% under the strongestone. Correspondingly, the explained gender pay gap increases from 5.1% to 13.1% as we exactlymatch on more wage determinants. Interestingly, the results diﬀer substantially for the publicsector. Here, a substantial part of the raw gender pay gap is due to women without support.The raw gender pay gap on support decreases from 13.9% under the weakest support deﬁnitionto only 4.5% under the strongest one. The unexplained gender pay gap on support decreasesquickly from 9.7% to to 3.5% when we impose support with respect to, and exactly match on, the20ight most important wage determinants. Thereafter, imposing stronger support and matchingon more variables leaves the unexplained gender pay gap almost unchanged although the rawgender pay gap decreases. This ﬁnding is in line with strongly regulated wage setting of theSwiss public sector, which leaves little room for gender pay diﬀerences.For the analysis of the unexplained gender pay gap with diﬀerent estimation methods andmodel ﬂexibility, we pick ﬁve deﬁnitions of support that illustrate the trade-oﬀs between com-parability, sample size and representativeness. We indicate the ﬁve versions with the respectivenumber in parentheses in Figure 1. Here it is important to note that for a given choice of variablesit does not matter in which order support is enforced. All possible orders will lead to exactlythe same sample. Support 1 does not change the sample except for excluding two women inthe public sector. Thus, it eﬀectively resembles the case without support enforcement. Support2 enforces support with respect to classical and very important wage determinants. Support 3covers all quantitatively important wage determinants. Support 4 adds tenure as the only proxyfor actual experience, but it does not signiﬁcantly explain wages. Support 5 is the most extremecase, which enforces full support with respect to all variables. Table B.6 in Online Appendix Bshows how imposing stricter support changes the average characteristics of the remaining womenwith support. The changes are largest with respect to occupation, management position, estab-lishment size, and part-time employment status. This is important for the interpretation of ourresults. In the presence of heterogeneous pay gaps, we cannot expect estimates to be the samewhen the sample changes due to stricter enforcement of support. In panels (a), (c) and (e) of Figures 2 and 3, we show the estimates of the average unexplainedgender pay gap for the private and public sector, respectively. The bars show the unexplainedgender pay gap estimates of the diﬀerent methodological choices we consider and the cappedlines show the 95% conﬁdence interval for each estimate. As a benchmark, the black horizontalline shows the BO estimate in Support 1 (full sample except for two women in the public sector)with the baseline speciﬁcation together with its 95% conﬁdence interval (black dashed lines).Furthermore, panels (b), (d) and (f) of both ﬁgures show the percentage diﬀerences of eachestimate from this benchmark.We ﬁnd that all of the methodological choices we consider matter greatly. Compared to theBO benchmark, using a less ﬂexible estimator like the LRM increases the estimated unexplainedgender pay gap by up to 29%. In contrast, increasing ﬂexibility by including wage determinantsmore ﬂexibly or by using a more ﬂexible estimator together with enforcing common support We do not include exact matching (EXM) as EXM does not control for all wage determinants in Supports(1)-(4), but EXM has already been discussed in the context of Figure 1. Tables D.1 and D.2 in Online AppendixD report all estimates including EXM with their standard error for the private and public sector, respectively. (a) Baseline model: estimates (b) Baseline model: diﬀerence to benchmark(c) Full model: estimates (d) Full model: diﬀerence to benchmark(e) ML model: estimates (f) ML model: diﬀerence to benchmark

Notes: We report all estimates and their standard errors in Table D.1 in Online Appendix D. Capped linesindicate the 95% conﬁdence intervals of the estimated unexplained gender pay gap. The black horizontal line inthe left panels shows the BO estimate of Support 1 as a benchmark with the baseline speciﬁcation and its 95%conﬁdence band in black dashed lines. The right panels show the diﬀerence from this benchmark in percent, i.e.,the diﬀerence from the BO estimate using Support 1 with baseline model speciﬁcation. PDS is the abbreviationfor post-double-selection procedure. (a) Baseline model: estimates (b) Baseline model: diﬀerence to benchmark(c) Full model: estimates (d) Full model: diﬀerence to benchmark(e) ML model: estimates (f) ML model: diﬀerence to benchmark

Notes: We report all estimates and their standard errors in Table D.2 in Online Appendix D. Capped linesindicate the 95% conﬁdence intervals of the estimated unexplained gender pay gap. The black horizontal line inthe left panels shows the BO estimate of Support 1 as a benchmark with the baseline speciﬁcation and its 95%conﬁdence band in black dashed lines. The right panels show the diﬀerence from this benchmark in percent, i.e.,the diﬀerence from the BO estimate using Support 1 with baseline model speciﬁcation. PDS is the abbreviationfor post-double-selection procedure.

Figure 4 shows the diﬀerences in the estimated unexplained gender pay gaps for each supportdeﬁnition relative to Support 1. We report separate results for each estimator, but pool theresults for the diﬀerent model speciﬁcations (baseline, full, ML) for a better overview.Figure 4: Diﬀerence in estimated unexplained gender pay gap relative to Support 1

Notes: Diﬀerence in percent to the estimate for Support 1 using the respective estimator. The results for themodel speciﬁcations (baseline, full, ML) are pooled.

The estimated pay gaps shrink substantially and, with few unsystematic exceptions, mono-tonically with stricter support enforcement. For the BO estimates, enforcing support withrespect to all quantitatively important wage determinants (Support 3) reduces the estimatedpay gap by around 6% in both sectors, while enforcing full support with respect to all variables(Support 5) reduces estimated gaps by around 26%. With 13-35% for Support 3 and 5-50% forSupport 5, the diﬀerences are considerably larger for the semi-parametric estimators IPW, PSMand EXPSM. Overall, enforcing common support has a very strong impact on the size of theestimated unexplained gender pay gaps.There are two possible explanations for this ﬁnding. First, stricter support enforcementmakes women and men more comparable. Hence, diﬀerences in observed wage determinantsare likely to explain a larger share of the raw gender pay gap. This is particularly true for thepublic sector, where a large part of the raw gender pay gap can be explained by lack of supportdirectly (see Figure 1). Second, heterogeneity in unexplained gender pay gaps can explain thediﬀerences across support samples that diﬀer increasingly in their composition (see Table B.6 in24nline Appendix B). For example, stricter support enforcement increasingly removes women inlower-paying jobs. As a result, the unexplained gender pay gap could either increase or decrease,depending on whether the unexplained gender pay gap is larger for women in higher- or lower-paying jobs. However, the direction cannot be inferred from data, because we cannot estimatethe unexplained gender pay gap for the omitted women who lack comparable men.The results are consistent with the ﬁndings of ˜Nopo (2008), who shows that 11% of Peru’sgender pay gap estimates can be explained by lack of common support with regard to age,education, marital status, and migration condition. We document that support violations canbe much more severe when accounting for a richer set of wage determinants. Furthermore, weshow that all estimators are aﬀected by support violations, including the parametric estimatorsthat rely on extrapolation.

Controlling for wage determinants more ﬂexibly is also important, especially for the parametricestimators. Figure 5 shows the percentage diﬀerences in the estimated unexplained genderpay gaps between the baseline model of the respective estimator and the full and ML modelspeciﬁcations. We report separate results for each estimator, but pool the results of the diﬀerentsupport deﬁnitions for a better overview.Figure 5: Diﬀerence in estimated unexplained gender pay gap relative to baseline model

Notes: Diﬀerence in percent of the baseline model estimate with the respective estimator. The results for thediﬀerent support deﬁnitions are pooled.

We ﬁnd that the most ﬂexible full model speciﬁcation reduces the BO estimate of the paygap by 6% in the private sector and 19% in the public sector. The reductions are only slightlysmaller for the ML speciﬁcation, which selects variables from the full model in a data-drivenway to increase eﬃciency. 25odel ﬂexibility matters most for the parametric estimators LRM and BO; as well as forAIPW, which can be viewed as an extension of BO with semi-parametric small-sample biasadjustment. All three estimators model the wage equation to estimate the unexplained genderpay gap. In contrast, the semi-parametric estimators IPW, PSM and EXPSM are much lesssensitive to how wage determinants are included. All three estimators only use the propensityscore as an input factor and do not impose any functional form on the wage equation. Forthose estimators, the estimated pay gaps diﬀer by less than 8% from the baseline model. Themost ﬂexible one, EXPSM, exhibits only very small diﬀerences across model speciﬁcations: only0.4% in the private sector and 3% in the public. For all estimators, the diﬀerence between thefull and ML models are not very large. A noticeable exception is the IPW estimator, wherethe unexplained gender pay gap of the ML speciﬁcation exceeds even the baseline speciﬁcation.Knaus et al. (2020) document low ﬁnite sample performance of IPW in combination with ML,which may explain this result.Kline (2011) provides a theoretical justiﬁcation for the sensitivity of the BO estimator withregard to model ﬂexibility. He argues that BO is more vulnerable to model misspeciﬁcationthan IPW, because the implicit weighting scheme of BO allows for negative weights (which isnot permitted for IPW). When the BO model is speciﬁed more ﬂexibly, though, negative implicitweights become less likely.We draw three conclusions from these results. First, controlling for wage determinants in aﬂexible way is important. Second, the more ﬂexibly they are included, the better; misspeciﬁ-cation of functional forms is less likely and the loss of degrees of freedom is not costly in ourvery large data set (but this might be diﬀerent in smaller data sets). Third, ﬂexible inclusionof wage determinants is more important for the estimators that incorporate the wage equationthan for estimators that only incorporate the propensity score. With our very large database,we also ﬁnd that applying machine learning methods for variable selection from a very rich setof non-linear and interaction terms has only a small impact on the estimated pay gaps. Thismay be diﬀerent, though, in smaller samples where eﬃciency is more of a concern.

The last dimension we vary is estimator choice. Figure 6 shows the diﬀerences in the estimatedunexplained gender pay gaps between BO and the respective estimator for each support. As theﬂexibility of including wage determinants aﬀects the parametric and semi-parametric estimatorsdiﬀerently, we focus on the results from the full model with maximum ﬂexibility for all estimators.This implies that the diﬀerences that we observe across estimators mainly result from diﬀerencesin the way the estimators restrict possible heterogeneity in unexplained gender pay gaps. TheLRM imposes homogeneous gender pay gaps. BO allows for heterogeneous gender pay gaps thatare driven by gender diﬀerences in the returns to the included wage determinants. In contrast,26he semi-parametric estimators IPW, PSM and EXPSM do not restrict heterogeneity in the paygaps at all. AIPW is a mixture between the parametric BO and a small sample bias adjustmentbased on the the semi-parametric IPW. With our very large samples, small-sample bias shouldbe negligible and BO and AIPW should yield very similar results.Figure 6: Diﬀerence in estimated unexplained gender pay gap relative to BO in respectivesupport

Notes: Diﬀerence in percent of the BO estimate in the respective support sample. All estimates are for the fullmodel, which is the most ﬂexible speciﬁcation.

We ﬁnd that for each support, LRM estimates of pay gaps are 8-18% higher than the BOestimate. The diﬀerences between the estimated BO and AIPW pay gaps are small as expectedwith our large samples. This may be diﬀerent in smaller samples, though. For the matchingestimators PSM and EXPSM we estimate pay gaps that are systematically and substantiallysmaller than the corresponding BO estimates on the same support by up to 20% in the privateand up to 34% in the public sector. This is in line with substantial heterogeneity in estimatedpay gaps that is widely acknowledged in the literature (e.g. Bach et al., 2018; Chernozhukov etal., 2018a; Goldin, 2014).However, we ﬁnd no systematic pattern for IPW. In the private sector, we ﬁnd estimatessmaller than with BO and similar to PSM and EXPSM for supports 1-4, but much largerestimates for support 5. In the public sector, we obtain estimates that are 5-24% larger than theBO estimates for supports 1-4 and much smaller for full support 5. The results for IPW showthat this estimator is very sensitive to the population studied.We conclude that it is important not to restrict pay gap heterogeneity by using a semi-parametric estimator. Moreover, among the semi-parametric estimators, PSM and EXPSMare much more robust and precise. The diﬀerences between PSM and EXPSM are small inmost cases. With a suﬃcient numbers of observations, EXPSM is preferable because it ensurescomparability of women and men in the best way among all considered estimators. Moreover,27XPSM is the least sensitive to the way we include the observed wage determinants becauseexact matching on the variables that deﬁne support fully removes all diﬀerences in these variableswithout imposing any restrictions.These ﬁndings are consistent with the results of Fr¨olich (2007), who documents that thePSM estimates of the unexplained gender pay gap in the UK are up to 29% smaller than theBO estimates. Black et al. (2008) also ﬁnd strong diﬀerences between BO and EXM estimatesof the unexplained gender pay gap in the United States. In particular, the estimates decline by18% for white women, 92% for Hispanic women, and 83% for Asian women. However, for blackwomen they ﬁnd no strong diﬀerence between the BO and EXM estimates. Likewise, ˜Nopo(2008) and Goraus et al. (2017) ﬁnd no strong diﬀerences in the BO and EXM estimates of theunexplained gender pay gap in Peru and Poland, respectively, but they only account for a verysmall set of wage determinants. In Figure 1, we show that the selection of control variables iscrucial for EXM.

A legitimate question is whether the results we obtain with the Swiss data are relevant for otherapplications. In terms of data, ours are very similar to those of the European Union Structure ofEarnings Survey (SES). The SES provides harmonized data on earnings in EU member states,candidate countries, and EFTA countries. It includes all of the wage determinants that havebeen used to deﬁne Supports 1-4 (but not 5), which covers all quantitatively important wagedeterminants. Moreover, like the Swiss data, actual work experience is not included. Studiesfor the US typically use survey data collected from individuals such as the Current PopulationSurvey (CPS), the US Census, or the American Community Survey (ACS). Such surveys containmuch richer information on individuals but potentially suﬀer from measurement error in self-reported wages. However, the key wage determinants we observe are included as well, and moststudies use a similar set of variables. In terms of sample size, all of the data sets are also verylarge, containing several hundred thousand observations.With respect to labor market institutions, Switzerland is a particularly interesting case. Onthe one hand, it has generous social insurance systems like many other European countries. Onthe other hand, it has a very ﬂexible labor market that is much less regulated than that of otherEuropean countries, which makes it more comparable to countries like the US. Thus, Switzerlandshares important features of both typically European labor markets and ﬂexible US-type labormarkets. The diﬀerences we ﬁnd between the private and the public sectors in Switzerland arealso likely to be relevant for other countries. Public sectors in most countries share the features28hat explain the diﬀerences in the Swiss case. They are are typically more homogeneous interms of covered occupations with strong concentration in certain service sectors, they typicallyattract a higher share of women than the private sector, they exhibit higher shares of high-skilledworkers, and they are typically more regulated, including their wage setting, than private sectors(see Brindusa et al., 2012).In summary, the data and labor market institutions in Switzerland are in many ways com-parable to those in other countries. Therefore, we expect that our qualitative results extendto other settings as well. Of course, the magnitudes of gender pay gaps diﬀer greatly acrosscountries (see e.g. Weichselbaumer and Winter-Ebmer, 2005; Van der Velde et al., 2015). Hence,we also expect the quantitative impact of methodological choices to diﬀer. However, given thatmost countries share key features of female employment and gender segregation in the labor mar-ket, we are conﬁdent that the methodological choices we study are relevant beyond the Swisscontext.

Our results suggest that deciding how to enforce common support is the ﬁrst important choicein applied research about the gender pay gap. Researchers face a trade-oﬀ between strictersupport to improve ex-ante comparability, and sample size. The latter aﬀects both the precisionof the estimate and the representativeness of the study sample. We recommend starting with ananalysis of common support using the approach of ˜Nopo (2008), adding variables in decreasingorder of their impact on counterfactual male wages. Based on this analysis, there are twopossible criteria for choosing the set of variables for enforcing support. One option is to use allquantitatively important wage determinants. In our application, this would result in supportdeﬁnition 3 with 39% of employed women lost in the private sector and 20% lost in the publicsector, which is quite substantial. Alternatively, one could base the decision on the importanceof wage determinants for explaining the raw gender pay gap rather than the counterfactual malewage. In this case, the researcher could choose the point at which the unexplained gender paygap obtained from exact matching stabilizes. In our data, this is the case somewhere betweenSupports 2 and 3 (see Figure 1). As sample size drops considerably after enforcement of Support2 plus irregular payments, this would be a natural choice of the set of variables for enforcingsupport. This would exclude 19% of employed women in the private sector and 10% in the publicsector with the composition of the study sample being quite similar to that of the full sample.The next important decision is which estimator to use. The results indicate that restrictinghow observed wage determinants and gender may aﬀect wages can have a large impact on theestimated unexplained gender pay gap. When using BO, including wage determinants in aﬂexible way is crucial. However, with a suﬃciently large sample, a ﬂexible matching estimator iseven better, as matching does not restrict gender pay gap heterogeneity in any way. The results29uggest that combining exact matching on few very important wage determinants that alsodeﬁne support with radius matching on a ﬂexibly speciﬁed propensity score works particularlywell. It minimizes the risk of functional form misspeciﬁcation and, with support deﬁned asdescribed in the last paragraph, oﬀers a good balance between ensuring comparability, precisionand representativeness of the study sample.Implementing these recommendations with our data would aﬀect the estimated unexplainedgender pay gap in the following way. Consider Support 3 as the more conservative of the twoalternatives discussed above. In the private sector, enforcing Support 3 reduces the raw genderpay gap of 18.6% by only 1%. Standard BO with the baseline speciﬁcation for including wagedeterminants that ignores support explains 58% of the raw gap in the full sample and results inan estimate of the unexplained pay gap of 7.7%. Enforcing Support 3 reduces this estimate by5%, using the most ﬂexible instead of the baseline speciﬁcation for BO by another 5%, and usingsemi-parametric EXPSM with the same ﬂexible speciﬁcation instead by another 13%. In total,the methodological choices decline the unexplained gender pay gap estimates by 23%. FlexibleEXPSM yields an estimated unexplained gap of 6% and explains 68% of the raw wage.The diﬀerences are even larger for the public sector. Enforcing Support 3 already reducesthe raw gender pay gap of 13.9% by 14%. Standard BO without support enforcement explains54% of the raw gap in the full sample and yields an estimate of the unexplained pay gap of6.4%. Enforcing Support 3 reduces this estimate by 4%. Using the most ﬂexible instead of thebaseline speciﬁcation for BO has a strong impact, as it reduces the estimate by another 19%.Using semi-parametric EXPSM with the same ﬂexible speciﬁcation instead leads to another evenmore substantial reduction of 26%. In total, the methodological choices decline the estimates ofthe unexplained gender pay gap by 50%. Flexible EXPSM yields an estimated unexplained gapof only 3.2% and explains 77% of the raw wage. This once again illustrates the importance ofthese methodological choices.

We study the sensitivity of estimates of the unexplained gender pay gap to three types ofmethodological choices: enforcement of comparability of men and women ex ante, ﬂexibility inthe inclusion of wage determinants, and choice of estimator. We ﬁnd that all of these choicesmatter greatly. Implementing the choices we recommended based on our results using data forthe Swiss private sector, explains 16% more of the raw wage gap than standard BO estimatesand results in estimated unexplained pay gaps that are 23% lower. For the public sector, thepreferred set of choices explains 42% more of the raw wage gap than standard BO estimates andresults in estimated unexplained pay gaps that are 50% lower.An important takeaway from our study is that the commonly reported Blinder-Oaxaca es-30imates of the unexplained gender pay gap can be misleading for several reasons. There is ahigh risk that they compare women to non-comparable men to an extent that is quantitativelyimportant. Moreover, they hide gender inequalities that result from this lack of comparability.For females with lack of support, we know nothing about possible inequalities in pay. Moreover,lack of support shows the extent of labor market segregation as an additional source of genderinequality. This is ignored if researchers do not check for common support. More generally, therestrictions standard BO imposes bear a high risk of yielding biased estimates of the unexplainedgender pay gap. In particular, they underestimate the full extent of heterogeneity in pay gapsthat is important for targeting aﬃrmative action campaigns. Future work should study thisheterogeneity in more detail.As a ﬁnal remark, we would like to emphasize that our ﬁndings provide guidance on howto estimate unexplained gender pay gaps with a given set of observed wage determinants. Theresults from such estimations are informative about equal pay for equal work taking individ-ual choices as a given, subject to any omitted variable bias that may result from unobservedfactors. Actual experience, having children, and personality traits are examples of such unob-served factors in the data we have used. Uncovering the extent and sources of possible genderdiscrimination in the labor market requires much more than accounting for unobserved wagedeterminants, though, such as dealing with selection into employment and endogenous controlvariables. 31 eferences

Adda, J., C. Dustmann, and K. Stevens , “The Career Costs of Children,”

Journal ofPolitical Economy , 2017, , 293–337.

Anderson, D.J., M. Binder, and K. Krause , “The Motherhood Wage Penalty: WhichMothers Pay It and Why?,”

American Economic Review , 2002, , 354–358.

Angelov, N., P. Johansson, and E. Lindahl , “Parenthood and the Gender Gap in Pay,”

Journal of Labor Economics , 2016, , 545–579.

Autor, D., D. Figlio, K. Karbownik, J. Roth, and M. Wasserman , “Family Disad-vantage and the Gender Gap in Behavioral and Educational Outcomes,”

American EconomicJournal: Applied Economics , 2019, , 338–381.

Bach, P., V. Chernozhukov, and M. Spindler , “Closing the U.S. Gender Wage Gap Re-quires Understanding its Heterogeneity,” arXiv:1812.04345 , 2018.

Bailey, M.J., B. Hershbein, and A.R. Miller , “The Opt-In Revolution? Contraceptionand the Gender Gap in Wages,”

American Economic Journal: Applied Economics , 2012, , 225–254.

Bar´on, J.D. and D.A. Cobb-Clark , “Occupational Segregation and the Gender Wage Gapin Private- and Public-Sector Employment: A Distributional Analysis,”

Economic Record ,2010, , 227–246.

Barsky, R., J. Bound, K. Charles, and J. Lupton , “Accounting for the Black-WhiteWealth Gap: A Nonparametric Approach,”

Journal of the American Statistical Association ,2002, , 663–673. Bayard, K., J. Hellerstein, D. Neumark, and K. Troske , “New Evidence on Sex Seg-regation and Sex Diﬀerences in Wages from Matched Employer-Employee Data,”

Journal ofLabor Economics , 2003,

21 (4) , 887–922.

Beaudry, P. and E. Lewis , “Do Male-Female Wage Diﬀerentials Reﬂect Diﬀerences in theReturn to Skill? Cross-City Evidence from 1980-2000,”

American Economic Journal: AppliedEconomics , 2014, , 178–194.

Belloni, A., D. Chen, V. Chernozhukov, and C. Hansen , “Sparse Models and Methods forOptimal Instruments With an Application to Eminent Domain,”

Econometrica , 2012, ,2369–2429. 32

V. Chernozhukov, and C. Hansen , “Inference on Treatment Eﬀects After SelectionAmongst High-Dimensional Controls (with an Application to Abortion and Crime),”

Reviewof Economic Studies , 2013, , 608–650. , , and , “High-Dimensional Methods and Inference on Treatment and Structural Eﬀectsin Economics,”

Journal of Economic Perspectives , 2014, , 29–50.

Bertrand, M. and J. Pan , “The Trouble with Boys: Social Inﬂuences and the Gender Gap inDisruptive Behavior,”

American Economic Journal: Applied Economics , 2013, , 32–64. and K.F. Hallock , “The Gender Gap in Top Corporate Jobs,”

ILR Review , 2001, ,3–21. , C. Goldin, and L.F. Katz , “Dynamics of the Gender Gap for Young Professionals in theFinancial and Corporate Sectors,”

Review of Economic Studies , 2010, , 228–255. , E. Kamenica, and J. Pan , “Gender Identity and Relative Income within Households,”

Quarterly Journal of Economics , 2015, , 571–614. , S.E. Black, S. Jensen, and A. Lleras-Muney , “Breaking the Glass Ceiling? TheEﬀect of Board Quotas on Female Labour Market Outcomes in Norway,”

Review of EconomicStudies , 2019, , 191–239.

Black, D.A., A. Haviland, S.G. Sanders, and L.J. Taylor , “Gender Wage Disparitiesamong the Highly Educated,”

Journal of Human Resources , 2008, , 630–659.

Blau, F.D. and L.M. Kahn , “Gender Diﬀerences in Pay,”

Journal of Economic Perspectives ,2000, , 75–99. and , “The Gender Wage Gap: Extent, Trends, and Explanations,”

Journal of EconomicLiterature , 2017, , 789–865.

Blinder, A. , “Wage Disrimination: Reduced Form and Structural Estimates,”

Journal of Hu-man Ressources , 1973, , 436–455.

Bonjour, D. and M. Gerﬁn , “The Unequal Distribution of Unequal Pay – An EmpiricalAnalysis of the Gender Wage Gap in Switzerland,”

Empirical Economics , 2001, , 407–427.

Brenøe, A.A. and S. Lundberg , “Gender Gaps in the Eﬀects of Childhood Family Environ-ment: Do they Persist into Adulthood?,”

European Economic Review , 2018, , 32–62.

Brenzel, H., H. Gartner, and C. Schnabel , “Wage Bargaining or Wage Posting? Evidencefrom the Employers’ Side,”

Labour Economics , 2014, , 41–48.33 riel, S. and M. T¨opfer , “The Gender Pay Gap Revisited: Does Machine Learning oﬀer NewInsights?,” Working Paper , 2020.

Brindusa, A., S. de la Rica, and J.J. Dolado , “The Eﬀect of Public Sector Employmenton Women’s Labour Market Outcomes,”

Washington, DC: World Bank , 2012.

Bruns, B. , “Changes in Workplace Heterogeneity and How They Widen the Gender WageGap,”

American Economic Journal: Applied Economics , 2019, , 74–113.

Buﬃngton, C., B. Cerf, C. Jones, and B.A. Weinberg , “STEM Training and EarlyCareer Outcomes of Female and Male Graduate Students: Evidence from UMETRICS DataLinked to the 2010 Census,”

American Economic Review , 2016, , 333–338.

Busso, M., J. DiNardo, and J. McCrary , “New Evidence on the Finite Sample Properties ofPropensity Score Reweighting and Matching Estimators,”

Review of Economics and Statistics ,2014, , 885–897.

B¨utikofer, A., S. Jensen, and K.G. Salvanes , “The Role of Parenthood on the Gender Gapamong Top Earners,”

European Economic Review , 2018, , 109–123.

Card, D., A.R. Cardoso, and P. Kline , “Bargaining, Sorting, and the Gender Wage Gap:Quantifying the Impact of Firms on the Relative Pay of Women,”

Quarterly Journal of Eco-nomics , 2016, , 633–686.

Chandrasekhar, A., V. Chernozhukov, F. Molinari, and P. Schrimpf , “Best LinearApproximations to Set Identiﬁed Functions: With an Application to the Gender Wage Gap,” cemmap Working Paper CWP09/19 , 2019.

Chernozhukov, V., I. Fern´andez-Val, and S. Luo , “Distribution Regression with SampleSelection, with an Application to Wage Decompositions in the UK,” arXiv:1811.11603 , 2020. , , and Y. Luo , “The Sorted Eﬀects Method: Discovering Heterogeneous Eﬀects BeyondTheir Averages,”

Econometrica , 2018, , 1911–1938. , I. Fernandez-Val, and B. Melly , “Inference on Counterfactual Distributions,”

Econo-metrica , 2013, , 2205–2268.

Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duﬂo, ChristianHansen, and Whitney Newey , “Double/Debiased/Neyman Machine Learning of Treat-ment Eﬀects,”

American Economic Review , 2017, , 261–265. , , , , , , and James Robins , “Double/Debiased Machine Learning for Treatmentand Structural Parameters,”

Econometrics Journal , 2018, , C1–C68.34 hetverikov, D., Z. Liao, and V. Chernozhukov , “On Cross-Validated Lasso,” arXiv:1605.02214 , 2017.

Cook, C., R. Diamond, J.V. Hall, J.A. List, and P. Oyer , “The Gender Earnings Gapin the Gig Economy: Evidence from over a Million Rideshare Drivers,”

Review of EconomicStudies , 2020, forthcoming.

Djurdjevic, D. and S. Radyakin , “Decomposition of the Gender Wage Gap Using Matching:An Application for Switzerland,”

Swiss Journal of Economics and Statistics , 2007, ,365–396.

Ejrnæs, M. and A. Kunze , “Work and Wage Dynamics around Childbirth,”

ScandinavianJournal of Economics , 2013, , 856–877.

Fern´andez, R. and J. Wong , “Unilateral Divorce, the Decreasing Gender Gap, and MarriedWomen’s Labor Force Participation,”

American Economic Review , 2014, , 342–347.

Fitzenberger, B., S. Steﬀes, and A. Strittmatter , “Return-to-Job During and AfterParental Leave,”

International Journal of Human Resource Management , 2016, , 803–831.

Flory, J.A., A. Leibbrandt, and J.A. List , “Do Competitive Workplaces Deter FemaleWorkers? A Large-Scale Natural Field Experiment on Job Entry Decisions,”

Review of Eco-nomic Studies , 2015, , 122–155.

Fortin, N.M. , “The Gender Wage Gap among Young Adults in the United States: The Im-portance of Money versus People,”

Journal of Human Resources , 2008, , 884–918. , T. Lemieux, and S. Firpo , “Decomposition Methods in Economics,” in O. Ashenfelterand D. Card, eds.,

Handbook of Labor Economics , Vol. 4A, North-Holland, 2011, pp. 1–102.

Fr¨olich, M. , “Propensity Score Matching without Conditional Independence Assumption - Withan Application to the Gender Wage Gap in the United Kingdom,”

Econometrics Journal , 2007, , 359–407.

Gayle, G.-L. and L. Golan , “Estimating a Dynamic Adverse-Selection Model: Labour-ForceExperience and the Changing Gender Earnings Gap 1968–1997,”

Review of Economic Studies ,2012, , 227–267.

Gneezy, U., K.L. Leonard, and J.A. List , “Gender Diﬀerences in Competition: EvidenceFrom a Matrilineal and a Patriarchal Society,”

Econometrica , 2009, , 1637–1664.35

M. Niederle, and A. Rustichini , “Performance in Competitive Environments: GenderDiﬀerences,”

Quarterly Journal of Economics , 2003, , 1049–1074.

Gobillon, L., D. Meurs, and S. Roux , “Estimating Gender Diﬀerences in Access to Jobs,”

Journal of Labor Economics , 2015, , 317–363.

Goldin, C. , “A Grand Gender Convergence: Its Last Chapter,”

American Economic Review ,2014, , 1091–1119. and J. Mitchell , “The New Life Cycle of Women’s Employment: Disappearing Humps,Sagging Middles, Expanding Top,”

Journal of Economic Perspectives , 2017, , 161–182. , S.P. Kerr, C Olivetti, and E. Barth , “The Expanding Gender Earnings Gap: Evidencefrom the LEHD-2000 Census,”

American Economic Review , 2017, , 110–114.

Goraus, K., J. Tyrowicz, and L. Van der Velde , “Which Gender Wage Gap Estimates toTrust? A Comparative Analysis,”

Review of Income and Wealth , 2017, , 118–146.

Graham, B.S., C.C.X. Pinto, and D. Egel , “Eﬃcient Estimation of Data Combination Mod-elsby the Method of Auxiliary-to-Study Tilting,”

Journal of Business and Economic Statistics ,2016, , 288–301.

Hastie, T., R. Tibshirani, and J. Friedman , Elemants of Statistical Learning: Data Mining,Inference, and Prediction , 2nd ed., Springer, 2009. , , and M. Wainwright , Statistical Learning with Sparsity: The Lasso and Generalizations ,CRC Press, 2016.

Heinze, A. and E. Wolf , “The Intra-Firm Gender Wage Gap: A New View on Wage Diﬀer-entials based on Linked Employer–Employee Data,”

Journal of Population Economics , 2010, , 851–879.

Hirano, Keisuke, Guido W. Imbens, and Geert Ridder , “Eﬃcient Estimation of AverageTreatment Eﬀects Using the Estimated Propensity Score,”

Econometrica , 2003, , 1161–1189.

Horvitz, D.G. and D.J. Thompson , “A Generalization of Sampling Without Replacementfrom a Finite Universe,”

Journal of the American Statistical Association , 1952, , 663–685.

Huber, M. , “Causal Pitfalls in the Decomposition of Wage Gaps,”

Journal of Business andEconomic Statistics , 2015, , 179–191. 36 nd A. Solovyeva , “On the Sensitivity of Wage Gap Decompositions,”

Journal of LaborResearch , 2020, , 1–33.

Khan, S. and E. Tamer , “Irregular Identiﬁcation, Support Conditions and Inverse WeightEstimation,”

Econometrica , 2010, , 2021–2042.

Kleven, H., C. Landais, and J.E. Søgaard , “Children and Gender Inequality: Evidencefrom Denmark,”

Americaan Economic Journal: Applied Economics , 2019, , 181–209. , , J. Posch, A. Steinhauer, and J. Zweim¨uller , “Child Penalties across Countries:Evidence and Explanations,”

AEA Papers & Proceedings , 2019, , 122–126.

Kline, P. , “Oaxaca-Blinder as a Reweighting Estimator,”

American Economic Review , 2011, , 532–537.

Knaus, M.C. , “Double Machine Learning based Program Evaluation under Unconfounded-ness,” arXiv:2003.03191 , 2020. , M. Lechner, and A. Strittmatter , “Machine Learning Estimation of HeterogeneousCausal Eﬀects: Empirical Monte Carlo Evidence,”

Econometrics Journal , 2020, forthcoming.

Krapf, M., A. Roth, and M. Slotwinski , “The Eﬀect of Childcare on Parental EarningsTrajectories,”

CESifo Working Paper 8764 , 2020.

Kunze, A. , “Gender Wage Gap Studies: Consistency and Decomposition,”

Empirical Eco-nomics , 2008, , 63–76., “The Gender Wage Gap in Developed Countries,” in S. L. Averett, L.M. Argys, and S.D.Hoﬀman, eds.,

Handbook on Women and the Economy , Oxford University Press, 2018, pp. 63–76.

K¨unzel, S.R., J.A. Sekhon, P.J. Bickel, and B. Yu , “Metalearners for Estimating Hetero-geneous Treatment Eﬀects using Machine Learning,”

Proceedings in the National Academy ofScience (PNAS) , 2019, , 4156–4165.

Lechner, M. and A. Strittmatter , “Practical Procedures to Deal with Common SupportProblems in Matching Estimation,”

Econometric Reviews , 2019, , 193–207. and C. Wunsch , “Are Training Programs More Eﬀective When Unemployment is High?,”

Journal of Labor Economics , 2009, , 653–692. , R. Miquel, and C. Wunsch , “Long-Run Eﬀects of Public Sector Sponsored Training inWest Germany,”

Journal of the European Economic Association , 2011, , 742–784.37 emieux, T. , “Occupations, Fields of Study and Returns to Education,”

Canadian Journal ofEconomics , 2014, , 1047–1077.

Liu, K. , “Explaining the Gender Wage Gap: Estimates from a Dynamic Model of Job Cchangesand Hours Changes,”

Quantitative Economics , 2016, , 411–447.

Lundborg, P., E. Plug, and A.W. Rasmussen , “Can Women Have Children and a Career?IV Evidence from IVF Treatments,”

American Economic Review , 2017, , 1611–1637.

Maasoumi, E. and L. Wang , “The Gender Gap between Earnings Distributions,”

Journal ofPolitical Economy , 2019, , 2438–2504.

Machado, C. , “Unobserved Selection Heterogeneity and the Gender Wage Gap,”

Journal ofApplied Econometrics , 2017, , 1348–1366.

Manning, A. and B. Petrongolo , “The Part-Time Pay Penalty for Women in Britain,”

Economic Journal , 2008, , F28–F51.

Meara, K., F. Pastore, and A. Webster , “The Gender Pay Gap in the USA: A MatchingStudy,”

Journal of Population Economics , 2020, , 271–305.

Neuman, S. and R.L. Oaxaca , “Wage Decompositions with Selectivity-Corrected WageEquations: A Methodological Note,”

Journal of Economic Inequality , 2004, , 3–10. ˜Nopo, H. , “Matching as a Tool to Decompose Wage Gaps,”

Review of Economics and Statistics ,2008, , 290–299.

Oaxaca, R. , “Male-Female Wage Diﬀerentials in Urban Labour Markets,”

International Eco-nomic Review , 1973, , 693–709.

Oberﬁchtner, M., C. Schnabel, and M. T¨opfer , “Do Unions and Works Councils ReallyDampen the Gender Pay Gap? Discordant Evidence from Germany,”

Economics Letters,forthcoming , 2020.

Olivetti, C. and B. Petrongolo , “Unequal Pay or Unequal Employment? A Cross-CountryAnalysis of Gender Gaps,”

Journal of Labor Economics , 2008, , 621–654. and , “The Evolution of Gender Gaps in Industrialized Countries,”

Annual Review ofEconomics , 2016, , 405–434. Robins, J.M., A. Rotnitzky, and L.P. Zhao , “Estimation of Regression Coeﬃcients WhenSome Regressors Are Not Always Observed,”

Journal of the American Statistical Association ,1994, , 846–866. 38 osenbaum, Paul R. and Donald B. Rubin , “The Central Role of Propensity Score inObservational Studies for Causal Eﬀects,”

Biometrika , 1983, , 41–55.

Roth, A. and M. Slotwinski , “Gender Norms and Income Misreporting within Households,”

CESifo Working Paper 7298 , 2018.

Sin, I., S. Stillman, and R. Fabling , “What Drives the Gender Wage Gap? Examiningthe Roles of Sorting, Productivity Diﬀerences, Bargaining and Discrimination,”

Review ofEconomics and Statistics , 2020, forthcoming.

Tibshirani, R. , “Regression Shrinkage and Selection via the Lasso,”

Journal of the RoyalStatistical Society, Series B , 1996, , 267–288.

Van der Velde, L., J. Tyrowicz, and J. Siwinska , “Language and (the Estimates of) theGender Wage Gap,”

Economics Letters , 2015, , 165–170.

Waldfogel, J. , “Understanding the “Family Gap” in Pay for Women with Children,”

Journalof Economic Perspectives , 1998, , 137–156.

Weichselbaumer, D. and R. Winter-Ebmer , “A Meta-Analysis of the International GenderWage Gap,”

Journal of Economic Surveys , 2005, , 479–511.

Winter-Ebmer, R. and J. Zweim¨uller , “Unequal Assignment and Unequal Promotion inJob Ladders,”

Journal of Labor Economics , 1997, , 43–71.

Yamaguchi, K. , “Decomposition of Gender or Racial Inequality with Endogenous InterveningCovariates: An Extension of the DiNardo-Fortin-Lemieux Method,”

Sociological Methodology ,2015, , 388–428.

Zou, H., T. Hastie, and R. Tibshirani , “On the ‘Degrees of Freedom’ of the Lasso,”

Annalsof Statistics , 2007,35(5)