Canonical Correlation and Assortative Matching: A Remark
aa r X i v : . [ ec on . GN ] F e b CANONICAL CORRELATION AND ASSORTATIVE MATCHING:A REMARK
ARNAUD DUPUY § AND ALFRED GALICHON † Abstract.
In the context of the Beckerian theory of marriage, when men and womenmatch on a single-dimensional index that is the weighted sum of their respective multi-variate attributes, many papers in the literature have used linear canonical correlation,and related techniques, in order to estimate these weights. We argue that this estimationtechnique is inconsistent and suggest some solutions.
Keywords : matching, marriage, assignment, assortative matching, canonical correlation.
JEL codes : C78, D61, C13. § CEPS/INSTEAD, Maastricht School of Management and IZA. Address: CEPS/INSTEAD, 3, avenuede la Fonte - L-4364 Esch-sur-Alzette, Luxembourg. Email: [email protected].
Tel: +352585855551,Fax: +352585855700. † Sciences Po Paris, Department of Economics, CEPR and IZA. Address: 27 rue Saint-Guillaume, 75007Paris, France. E-mail: [email protected].
Tel: +33(0)145498582, Fax: +33(0)145497257.
Date : May 23, 2014. The authors thank two anonymous referees, the Editor (Xavier D’Haultfoeuille),as well as Xavier Gabaix, Bernard Salani´e and Marko Tervi¨o for helpful comments. Galichon’s researchhas received funding from the European Research Council under the European Union’s Seventh FrameworkProgramme (FP7/2007-2013) / ERC grant agreement no 313699, and from FiME, Laboratoire de Financedes March´es de l’Energie. Dupuy warmly thanks the Maastricht School of Management where part of thisresearch was performed. § AND ALFRED GALICHON † Introduction . Who marries whom and why are questions that have received tremendousattention by scientists from many different fields: economics, sociology, psychology andbiology. This literature shows that a correlation between spouses’ attributes exists for manyattributes, i.e. height, weight, education, earnings, wealth, religion, ethnicity, personalitytraits to mention just a few. How many and which of these attributes actually matter forthe sorting of men and women? Up until recently, by lack of a better methodology, theliterature dealt with the first question by simply assuming that men and women match on asingle-dimensional index that is the weighted sum of their respective multivariate attributes.The second question was then dealt with using linear Canonical Correlation, and relatedtechniques, in order to estimate the weights of the indices for men and women. This paperargues that this estimation technique is inconsistent and suggest some solutions.Since Becker’s (1973) seminal contribution, the marriage market has been predominantlymodeled as a matching market with transferable utility. Men and women are characterizedby vectors of attributes denoted respectively x ∈ R d x for men and y ∈ R d y for women. Thesevectors may incorporate various dimensions such as education, wealth, health, physicalattractiveness, personality traits, etc. It is assumed that when a man with attributes x and a woman with attributes y form a pair, they generate a surplus equal to Φ ( x, y ). Thissurplus is shared endogenously between the two partners. Denoting P and Q the respectiveprobability distributions of attributes of married men and women, it follows from the resultsof Shapley and Shubik (1972) that the stable matching will maximize E [Φ ( X, Y )]with respect to all joint distributions of (
X, Y ) such that X ∼ P and Y ∼ Q . For conve-nience, we assume that these distributions are centered R xdP ( x ) = R ydQ ( y ) = 0.Becker went further in the analysis by assuming that sorting occurs on single-dimensional ability indices for men and women, say ¯ x and ¯ y , which are constructed linearly with respectto the original attributes ¯ x = α ′ x and ¯ y = β ′ y ANONICAL CORRELATION AND ASSORTATIVE MATCHING: A REMARK 3 where α ∈ R d x and β ∈ R d y are the weights according to which the various attributesenter the respective indices. Following Becker (1973), assume that the matching surplus ofindividuals of attributes x and y , denoted Φ ( x, y ), only depends on the indices ¯ x and ¯ y andtakes the form Φ ( x, y ) = φ (cid:0) α ′ x, β ′ y (cid:1) where φ is supermodular, that is ∂ x, ¯ y φ (¯ x, ¯ y ) ≥
0. As a result, the solution exhibits positiveassortative matching, that is, the equilibrium distribution of the attributes across couples isrepresented by a joint random vector (
X, Y ) ∼ π where α ′ X and β ′ Y are comonotone : theman at percentile t in the distribution of α ′ X is matched with the woman at percentile t inthe distribution of β ′ Y . In other words, denoting F Z the cumulative distribution functionof Z , we can state as the main assumption of this note that: Assumption 1.
There are weights α and β such that the indices α ′ X and β ′ Y are comono-tone, that is F β ′ Y (cid:0) β ′ Y (cid:1) = F α ′ X (cid:0) α ′ X (cid:1) . If the cumulative distribution function F β ′ Y is invertible, one may then write β ′ Y = T (cid:0) α ′ X (cid:1) where T ( z ) = F − β ′ Y ◦ F α ′ X ( z ) is a nondecreasing map; thus the ability index of a woman isa nondecreasing function of that of the man she is matched with.Given this specification and the observation of ( X, Y ) ∼ π , one would like to estimate( α, β ). To this end, Becker (1973) suggested (p. 834) to use Canonical Correlation Analysis,a technique originally introduced by Hotelling (1936). This method consists in determiningthe weights α c and β c that maximize the correlation between α ′ X and β ′ Y . Formally,introducing the following notationsΣ XY = E π (cid:2) XY ′ (cid:3) , Σ X = E π (cid:2) XX ′ (cid:3) , Σ Y = E π (cid:2) Y Y ′ (cid:3) , Since we are primarily interested about the consistency of Canonical Correlation and related techniques,throughout this paper we assume that the analyst has access to a sample of infinite size.
ARNAUD DUPUY § AND ALFRED GALICHON † Canonical Correlation consists in defining α c and β c as the maximizers of the correlation of α ′ X and β ′ Y over all possible vectors of weights α and β . The problem therefore consistsin solving the following program max α ∈ R dx ,β ∈ R dy α ′ Σ XY β (1) s.t. α ′ Σ X α = 1 and β ′ Σ Y β = 1whose value at optimum is in general less or equal than one.In the applied literature, α and β are frequently estimated by multivariate OrdinaryLeast Squares (OLS) regression. It is worth remarking that this is closely related, but notquite identical to, Canonical Correlation. Consider the following OLS regression Y = α ′ X − β ′ − Y − + ε where ε is an error term, Y is the top element of Y , and Y − the vector of the remainingentries. Let α o and β o − be the coefficients obtained from OLS. Introducing β o = (cid:0) β o ′− (cid:1) ′ ,it is easy to show that ( α o , β o ) solves the programmax α ∈ R dx ,β ∈ R dy α ′ Σ XY βs.t. α ′ Σ X α = A and β ′ Σ Y β = B and β = 1 . where A = α o ′ Σ X α o and B = β o ′ Σ Y β o . Without the constraint β = 1, this would yieldthe same solutions (up to some rescaling of α and β ) as the solutions given by CanonicalCorrelation. In general, the solutions differ due to this constraint. Even though the OLStechnique is better known and more immediately accessible to practitioners, it artificiallybreaks down symmetry between variables by singling out the role of Y . Note that in thecase where Y is univariate ( d y = 1) the constraint β = 1 has no bite, and the two solutionscoincide (again, up to rescaling).Many papers have used Canonical Correlation or OLS techniques to estimate α and β . Notable examples of the application of Canonical Correlation on the marriage marketare Suen and Lui (1999), Gautier et al. (2005) and Taubman (2006). Many papers have ANONICAL CORRELATION AND ASSORTATIVE MATCHING: A REMARK 5 applied OLS techniques to study assortative mating when faced with multiple dimensions,see Kalmijn (1998) for a survey of this literature. A notable example of such applications ofOLS is the extensive literature on the effect of a wife’s education on her husband’s earnings:see among others Benham (1974), Scully (1979), Wong (1986), Lam and Schoeni (1993,1994), and Jepsen (2005).
The consistency problem . A crucial question is whether the Canonical Correlationmethod is consistent, namely whether ( α c , β c ) = ( α, β ). It turns out that the answer is yesin the case of Gaussian marginal distributions P and Q , but no in more general cases as weshall now explain. We now state our result. The main statement, part (ii) of the theorem,is proven using a counterexample. Theorem 1 ((In-)Consistency of Canonical Correlation) . The following holds:(i) If P and Q are Gaussian distributions, then the Canonical Correlation is consistentin the sense that ( α c , β c ) = ( α, β ) . (ii) In general, Canonical Correlation is not consistent.Proof. (i) When P = N (0 , Σ X ) and Q = N (0 , Σ Y ), with α, β = 0 two vectors of weights,then max X ∼ P,Y ∼ Q E (cid:2) α ′ XY ′ β (cid:3) = p α ′ Σ X α p β ′ Σ Y β, where the optimization is over the set of random vectors ( X, Y ) with fixed marginal distri-butions P and Q . Thus, for ( X , Y ) solution of the above problem, the correlation between α ′ X and β ′ Y is one. Indeed, the optimal ( X, Y ) is such that β ′ Y = r β ′ Σ Y βα ′ Σ X α α ′ X. The result is immediate: for the optimal (
X, Y ), the correlation between α ′ X and β ′ Y isone and since this is the maximal value of Program (1), it follows that ( α, β ) = ( α c , β c ). ARNAUD DUPUY § AND ALFRED GALICHON † (ii) However, when P and Q fail to be Gaussian, the Canonical Correlation estimator( α c , β c ) differs from the true parameters ( α, β ) in general. Consider the following example.Let P be the distribution of ( X , X ) where X is independent of X and V ( X ) = V ( X ) =1. Let Q be the distribution of Y . Provided that the surplus function satisfies Φ ( x, y ) = φ ( α ′ x, β ′ y ) such that sorting is unidimensional, optimal matching yields: Y = T ( α X + α X ) β where T := F − α ′ X ( F βY ( . )) and F γ ′ Z denotes the c.d.f. of γ ′ Z . Note that the mappingfunction T depends on P , Q , α and β . In this setting, the Canonical Correlation estimator( α c , α c ) of ( α , α ) solves max α ,α α cov ( X , Y ) + α cov ( X , Y ) s.t. α + α = 1whose solution is α c α c = cov ( X Y ) cov ( X Y ) . (2)In such an economy, data on “couples” are such that Y = T ( α X + α X ) β for all X . Replacing Y by its expression in terms of X in the right hand side of eq. 2 yields α c α c = cov ( X T ( α X + α X )) cov ( X T ( α X + α X )) . It follows that the Canonical Correlation estimator is consistent if and only if α c α c = α α that is if and only if α α = cov ( X T ( α X + α X )) cov ( X T ( α X + α X )) . (3)It is easy to see that this condition will be satisfied when ever T is linear (with constant a and slope b ), a case that arises for instance when P and Q are Gaussian as in i), for thenone has cov ( X ( a + bα X + bα X )) cov ( X ( a + bα X + bα X )) = α α . However, as soon as T is nonlinear, there are no reasons to expect that T will satisfycondition 3. For instance, let P be the distribution of ( X , X ) where X takes value 1with probability 1 / − /
2, and X is exponentially distributed withparameter 1 and independent of X . Let G be the c.d.f. of X , so that G ( z ) = 1 − exp ( − z ). ANONICAL CORRELATION AND ASSORTATIVE MATCHING: A REMARK 7
Let Q = U ([0 , α = α = 1 / √
2, so that ˆ X = X + X √ . Hence the optimal coupling (cid:16) ˆ X, ˆ Y (cid:17) is such that ˆ Y = F ˆ X (cid:16) ˆ X (cid:17) where F ˆ X ( . ) is the c.d.f. of ˆ X , which is expressed as F ˆ X ( x ) = 12 (cid:16) G (cid:16) x √ (cid:17) + G (cid:16) x √ − (cid:17)(cid:17) . Clearly, in this example T := F ˆ X is not linear in this case such that one should not expectCanonical Correlation to be consistent. And, indeed, one hasˆ Y = ( G ( X ) + G ( X − X = − ( G ( X + 2) + G ( X )) if X = 1 , and a calculation shows that cov (cid:16) X , ˆ Y (cid:17) = E G ( X + 2) − E G ( X − E G ( X + 2) = 1 − e − / E G ( X −
2) = e − /
2, we get cov (cid:16) X , ˆ Y (cid:17) = 14 (cid:0) − e − (cid:1) . (4)Similarly, E h X ˆ Y i = 14 E [ X G ( X − E [ X G ( X + 2)] + 12 E [ X G ( X )]and using the fact that E [ X G ( X − e − /
4, that E [ X G ( X + 2)] = 1 − e − /
4, andthat E [ X G ( X )] = 3 /
4, we get E h X ˆ Y i = (cid:0) e − + 5 (cid:1) /
8, hence, as E [ X ] E h ˆ Y i = 1 / cov (cid:16) X , ˆ Y (cid:17) = 3 e − + 18 . (5)Using (4) and (5), this becomes α c α c = 3 + e e − = α α = 1 . Therefore the Canonical Correlation estimator is not consistent in this example. (cid:3)
Note that the example in part (ii) of the proof also shows that OLS is inconsistent. Inthis example the dimension of Y is one, so that OLS and Canonical Correlation yield thesame estimators of α and β . The above example has nothing pathological and impliesthat estimators of ( α, β ) based on Canonical Correlation face the risk of being biased as ARNAUD DUPUY § AND ALFRED GALICHON † soon as the marginal distributions are not Gaussian or such that the mapping function T := F − α ′ X (cid:0) F β ′ Y ( . ) (cid:1) is not linear. Final remarks . The problem discussed in this paper obviously raises the question: howcan we replace Canonical Correlation by a technique that is consistent? One first proposal,as suggested in Tervi¨o (2003, p. 83), is to look for α and β that maximize Spearman’s rankcorrelation between α ′ X and β ′ Y . In other words, look formax α ∈ R dx ,β ∈ R dy E (cid:2) F α ′ X (cid:0) α ′ X (cid:1) F β ′ Y (cid:0) β ′ Y (cid:1)(cid:3) s.t. α ′ Σ X α = 1 and β ′ Σ β = 1 . where we recall that F α ′ X stands for the c.d.f. of α ′ X . The value of this program cannotexceed 1/3 and, when the distributions of X and Y are continuous, it is equal to 1/3 when α ′ X and β ′ Y are comonotone. However the objective function, which can be rewritten as Z Pr (cid:0) max (cid:0) α ′ ( x − X ) , β ′ ( y − Y ) (cid:1) ≤ (cid:1) dF X ( x ) dF Y ( y ) , has no reason to be convex with respect to α and β , so global optimization techniques maybe needed. Also, this technique, just as Canonical Correlation, does not deal with any kindof unobserved heterogeneity. To remedy this drawback, two solutions have very recentlybeen proposed.The first solution is justified if one is willing to assume that sorting occurs on a singleindex of attractiveness. This strategy, developed by Chiappori et al. (2012), consists inestimating the conditional expectations E [ Y k | X = x ], which, if the sorting actually occurson a single-index, should be a deterministic function of α ′ X . Hence the weight vector α isidentified up to a constant by the marginal rates of substitutions α i α j = ∂ E [ Y k | X = x ] /∂x i ∂ E [ Y k | X = x ] /∂x j . Moving outside of single-dimensional indices, Dupuy and Galichon (2014) have introduceda technique they call “saliency analysis”, which allows to infer the number of dimensions on which sorting occurs, and estimate the corresponding (possibly multiple) indices ofattractiveness that determine this sorting.
ANONICAL CORRELATION AND ASSORTATIVE MATCHING: A REMARK 9
The idea is to estimate A in the quadratic specification for the surplus functionΦ ( x, y ) = x ′ Ay, applying for instance the procedure depicted in Dupuy and Galichon (2014), and using asingular value decomposition to test whether the dimension of A is e.g. one, in which case A = αβ ′ . This provides a consistent estimation of α and β . Note however, that the unitsof the parameters of the affinity matrix reflect the units in which X and Y are measured.For our method to be robust to changes in measurement units, we need to normalize theattributes in X and Y . By performing the Singular Value Decomposition on the affinitymatrix associated with the normalized attributes, we ensure that the loadings of the indicesof mutual attractiveness are independent of the choice of measurement units. For the sakeof notation and compactness we herewith simply assume that X and Y have been rescaledsuch that all attributes are of variance 1.Performing a singular value decomposition of A yields A = U ′ Λ V, where the diagonal matrix Λ has nonincreasing elements ( λ , ..., λ d ) called singular values, d = min ( d x , d y ) on its diagonal. By construction, U and V are orthogonal matrices.One can then define vectors of indices of mutual attractiveness by constructing˜ X = U X and ˜ Y = V Y, where each index is a weighted sum of the attributes in X and Y respectively.Denote A ˜ X ˜ Y the affinity matrix corresponding to the vectors of characteristics ˜ X and ˜ Y .Dupuy and Galichon (2014) have shown that in fact A ˜ X ˜ Y = Λ, and as a resultΦ ( x, y ) = d x X i =1 d y X j =1 A ij x i y j = d X i =1 λ i ˜ x i ˜ y i . The weights of each index of mutual attractiveness constructed by Saliency Analysiscan be read on the associated row of U for men and V for women whereas the share ofthe matching utility of couples explained by the i th pair of indices is given by λ i / ( P i λ i ).Saliency Analysis answers two important questions: how many and which attributes matter § AND ALFRED GALICHON † for the sorting of men and women on the marriage market. Intuitively, the number of nonzero singular values indicates the number of indices that matter for the sorting problemand the parameters of U and V indicate which attributes matter in each index of men andeach index of women. If there is only one non zero singular value, then sorting occurs on asingle index whose weights are given by the first row of U for men and V for women andcorrespond to α and β respectively. References [1] Becker, Gary (1973). ”A theory of marriage, part I,”
Journal of Political Economy , 81, pp. 813-846.[2] Benham, Lee (1974). “Benefits of womens education within marriage,”
Journal of Political Economy ,82(2), pp. S57–S71.[3] Chiappori, Pierre-Andre., Oreffice, Sonia and Quintana-Domeque, Clement (2012). “Fatter attraction:anthropometric and socioeconomic matching on the marriage market,”
Journal of Political Economy ,120(4), pp. 659–695.[4] Dupuy, Arnaud, and Alfred Galichon (2014). “Personality traits and the marriage market,” to appearin the
Journal of Political Economy .[5] Gautier, Pieter, Michael Svarer, and Coen Teulings (2010). “Marriage and the city: Search frictionsand sorting of singles,”
Journal of Urban Economics
Biometrika
28, pp. 321–329.[7] Jepsen, Lisa (2005). “The relationship between wifes education and husbands earnings: Evidence from1960–2000,”
Review of Economics of the Household
3, pp. 197–214.[8] Kalmijn, Matthijs (1998). “Intermarriage and Homogamy: Causes, Patterns, Trends,”
Annual Reviewof Sociology
24, pp. 395–421.[9] Lam, David, and Robert Schoeni (1993). “Effects of family background on earnings and returns toschoolings: Evidence from Brazil,”
Journal of Political Economy
101 (4), pp. 710–740.[10] Lam, David, and Robert Schoeni (1994). “Family ties and labour markets in the United States andBrazil,”
Journal of Human Resources
29, pp. 1235–1258.[11] Scully, Gerald (1979). “Mullahs, Muslims and marital sorting,”
Journal of Political Economy
87, pp.1139–1143.[12] Shapley, Lloyd, and Martin Shubik (1972). “The Assignment Game I: The Core,”
International Journalof Game Theory
1, pp. 111–130.[13] Suen, Wing, and Hon-Kwong Lui (1999). “A direct test of efficient marriage market hypothesis,”
Eco-nomic Inquiry
37 (I), pp. 29–46.
ANONICAL CORRELATION AND ASSORTATIVE MATCHING: A REMARK 11 [14] Taubman, Orit (2006). “Couple similarity for driving style,”
Transportation Research Part F
9, pp.185–193.[15] Tervi¨o, Marko. (2003). “Studies of Talent Markets,” MIT PhD. dissertation.[16] Wong, Yue-Chim. (1986). “Entrepreneurship, Marriage, and Earnings,”