[PDF] A simple coding for cross-domain matching with dimension reduction via spectral graph embedding

Abstract

Data vectors are obtained from multiple domains. They are feature vectors of images or vector representations of words. Domains may have different numbers of data vectors with different dimensions. These data vectors from multiple domains are projected to a common space by linear transformations in order to search closely related vectors across domains. We would like to find projection matrices to minimize distances between closely related data vectors. This formulation of cross-domain matching is regarded as an extension of the spectral graph embedding to multi-domain setting, and it includes several multivariate analysis methods of statistics such as multiset canonical correlation analysis, correspondence analysis, and principal component analysis. Similar approaches are very popular recently in pattern recognition and vision. In this paper, instead of proposing a novel method, we will introduce an embarrassingly simple idea of coding the data vectors for explaining all the above mentioned approaches. A data vector is concatenated with zero vectors from all other domains to make an augmented vector. The cross-domain matching is solved by applying the single-domain version of spectral graph embedding to these augmented vectors of all the domains. An interesting connection to the classical associative memory model of neural networks is also discussed by noticing a coding for association. A cross-validation method for choosing the dimension of the common space and a regularization parameter will be discussed in an illustrative numerical example.

Full PDF

AA simple coding for cross-domain matchingwith dimension reduction via spectral graphembedding

Hidetoshi Shimodaira ∗ Division of Mathematical ScienceGraduate School of Engineering ScienceOsaka University1-3 Machikaneyama-choToyonaka, Osaka, Japane-mail: [email protected]

Abstract:

Data vectors are obtained from multiple domains. They are feature vectors of im-ages or vector representations of words. Domains may have diﬀerent numbers of datavectors with diﬀerent dimensions. These data vectors from multiple domains are pro-jected to a common space by linear transformations in order to search closely relatedvectors across domains. We would like to ﬁnd projection matrices to minimize distancesbetween closely related data vectors. This formulation of cross-domain matching is re-garded as an extension of the spectral graph embedding to multi-domain setting, andit includes several multivariate analysis methods of statistics such as multiset canon-ical correlation analysis, correspondence analysis, and principal component analysis.Similar approaches are very popular recently in pattern recognition and vision. In thispaper, instead of proposing a novel method, we will introduce an embarrassingly sim-ple idea of coding the data vectors for explaining all the above mentioned approaches.A data vector is concatenated with zero vectors from all other domains to make anaugmented vector. The cross-domain matching is solved by applying the single-domainversion of spectral graph embedding to these augmented vectors of all the domains.An interesting connection to the classical associative memory model of neural networksis also discussed by noticing a coding for association. A cross-validation method forchoosing the dimension of the common space and a regularization parameter will bediscussed in an illustrative numerical example.

Keywords and phrases: multiple domains, common space, matching weight, multi-variate analysis, canonical correlation analysis, spectral graph embedding, associativememory, sparse coding.

1. Introduction

We consider multiple domains for getting data vectors. Let D be the number of domains,and d = 1 , . . . , D denote each domain. For example, d = 1 may be for images, and d = 2for words. From domain d , we get data vectors x di ∈ R p d , i = 1 , . . . , n d , where n d is thenumber of data vectors, and p d is the dimension of the data vector. They may be imagefeature vectors for d = 1, and word vectors computed by word2vec (Mikolov et al., 2013)from texts for d = 2. Typically, p d is hundreds, and n d is thousands to millions. We wouldlike to retrieve relevant words from an image query, and alternatively retrieve images froma word query.We specify the strength of association between two data vectors x di and x ej by a matchingweight w deij ∈ R for d, e = 1 , . . . , D , i = 1 , . . . , n d , j = 1 , . . . , n e . (Note that “matching” here ∗ Supported in part by Grant KAKENHI (24300106, 26120523) from MEXT of Japan.1 a r X i v : . [ s t a t . M L ] M a r . SHIMODAIRA/cross-domain matching is nothing related to that of graph theory.) We assume the weight is symmetric w deij = w edji .For example, w = 3 for the association between an image “apple” ( x ) and word “apple”( x ), and w = 1 for the association between the image “apple” and word “red” ( x ).However, it could be the case that the image apple is unlabeled and w = 0, while the colormay be automatically classiﬁed as red and w = 1 remains. Let ¯ w deij be the matching weightrepresenting the underlying true associations, and w deij be observed ones sampled from thetrue associations. We assume w deij = ¯ w deij with a small probability, and w deij = 0 otherwise,so that W de = ( w deij ) ∈ R n d × n e would be a sparse matrix.The data vectors from all the domains will be projected to a single common space of R K for some K >

0. Using a matrix A d ∈ R p d × K , we deﬁne a linear transformation by y di = ( A d ) T x di , i = 1 , . . . , n d ; d = 1 , . . . , D. (1)Here T denotes matrix transpose. Later we use matrix notation such as tr() for the matrixtrace, diag() for a diagonal matrix, and Diag() for a block diagonal matrix. Each element of y di ∈ R K is ( y di ) k = ( a dk ) T x di , k = 1 , . . . , K, where a dk ∈ R p d are deﬁned as A d = ( a d , . . . , a dK ). The error function of cross-domainmatching is φ ( A , . . . , A D ) = 12 D (cid:88) d =1 D (cid:88) e =1 n d (cid:88) i =1 n e (cid:88) j =1 w deij (cid:107) y di − y ej (cid:107) , (2)and we would like to ﬁnd A , . . . , A D that minimize (2) subject to certain constraints. Thisis a supervised learning with the matching weights as training data. It handles the problemof semi-supervised learning and missing observation by simply letting unobserved weightszero. For a new query image, say, the data vector x ∈ R p is transformed to y = ( A ) T x .Then look for points close to y in the collection { y di } . By working on the common spacein this way, we will perform data retrieval across domains and data fusion from multipledomains.This formulation of cross-domain matching is regarded as an extension of the spectralgraph embedding of Yan et al. (2007) to the multi-domain setting, and similar approachesare very popular recently in pattern recognition and vision (Correa et al., 2010; Yuan et al.,2011; Kan et al., 2012; Huang et al., 2013; Shi et al., 2013; Wang et al., 2013; Gong et al.,2014; Yuan and Sun, 2014). In particular, the formulation reduces to a classical multivariateanalysis of statistics, known as the multiset canonical correlation analysis (MCCA) (Ket-tenring, 1971; Takane, Hwang and Abdi, 2008; Tenenhaus and Tenenhaus, 2011) by letting n = n = · · · = n D and connecting all vectors across domains with the same index as w deii (cid:54) = 0 and w deij = 0 for i (cid:54) = j . Class labels are coded by indicator variables (called dummyvariables in statistics) and treated as domains; they appear in canonical discriminant analy-sis and correspondence analysis. The formulation becomes the classical canonical correlationanalysis (CCA) of Hotelling (1936) by further letting D = 2, or it becomes principal com-ponent analysis (PCA) by letting p = p = · · · = p D = 1.In this paper, we do not intend to propose a novel method. Instead, we will introduce anembarrassingly simple idea of coding the data vectors for explaining all the above mentionedapproaches. This coding is similar to that of Daum´e III (2009). Let P = (cid:80) Dd =1 p d and N = (cid:80) Dd =1 n d . The data vector x di is coded as an augmented vector ˜ x di ∈ R P deﬁned as( ˜ x di ) T = (cid:16) ( p ) T , . . . , ( p d − ) T , ( x di ) T , ( p d +1 ) T , . . . , ( p D ) T (cid:17) . (3) . SHIMODAIRA/cross-domain matching Here, p ∈ R p is the vector with zero elements. This is a sparse coding (Olshausen andField, 2004) in the sense that nonzero elements for domains do not overlap each other. Allthe N vectors of all domains are now represented as points in the same R P . We will getthe solution of the optimization problem of (2) by applying the single-domain version of thespectral graph embedding of Yan et al. (2007) to these ˜ x di vectors.In Section 2, we will review the spectral graph embedding methods. In Section 3, we willshow that the coding (3) solves the minimization of (2). An interesting connection to theclassical associative memory model of neural networks (Kohonen, 1972; Nakano, 1972) is alsodiscussed there by noticing that coding ˜ x di + ˜ x ej corresponds to the matching w deij . In Section 4,the relations to the multivariate analysis methods are explained. In Section 5, we show anillustrative numerical example of cross-domain matching. In particular, we discuss a cross-validation method for choosing the dimension K of the common space and a regularizationparameter; we resample the matching weights w deij instead of data vectors x di there.

2. A brief review of the spectral graph embedding

Before discussing the cross-domain matching, here we review the spectral graph theory(Chung, 1997). We then consider extra constraints in Section 2.2. This result will be usedfor the cross-domain matching in Section 3. The following argument is based on the spectralclustering, in particular the normalized graph Laplacian (Shi and Malik, 2000; Ng et al.,2002; Von Luxburg, 2007) and the spectral embedding (Belkin and Niyogi, 2003).Let

N > y i ∈ R K , i = 1 , . . . , N of dimension K ≤ N . The weighted adjacency matrix is W = ( w ij ) ∈ R N × N with symmetric weights w ij = w ji ≥ i, j = 1 , . . . , N . Let M =diag( W N ) ∈ R N × N be the diagonal matrix with elements (cid:80) Nj =1 w ij , i = 1 , . . . , N . Here N ∈ R N denotes the vector with all elements being 1. The graph Laplacian is M − W .For a given W , we would like to ﬁnd y , . . . , y N that minimize the error function φ ( y , . . . , y N ) = 12 N (cid:88) i =1 N (cid:88) j =1 w ij (cid:107) y i − y j (cid:107) (4)subject to certain constraints. For avoiding the trivial solution of all zero vectors, we assumethe constraints Y T M Y = I K , (5)where Y ∈ R N × K is deﬁned by Y T = ( y , . . . , y N ), and I K ∈ R K × K is the identity matrix.By simple rearrangement of the formula, we gettr( Y T W Y ) = N (cid:88) i =1 N (cid:88) j =1 w ij y Ti y j , tr( Y T M Y ) = N (cid:88) i =1 (cid:16) N (cid:88) j =1 w ij (cid:17) (cid:107) y i (cid:107) . Thus the error function is rewritten as φ ( y , . . . , y N ) = 12 N (cid:88) i =1 N (cid:88) j =1 w ij (cid:16) (cid:107) y i (cid:107) + (cid:107) y j (cid:107) − y Ti y j (cid:17) = tr( Y T ( M − W ) Y )= K − tr( Y T W Y ) . . SHIMODAIRA/cross-domain matching Therefore, minimization of (4) is equivalent to maximization of tr( Y T W Y ).Let M − / ∈ R N × N be the diagonal matrix with elements ( (cid:80) Nj =1 w ij ) − / , i = 1 , . . . , N .The eigenvalues of M − / W M − / are λ ≥ λ ≥ · · · ≥ λ N and the corresponding nor-malized eigenvectors are u , . . . , u N ∈ R N . The solution of minimizing (4) subject to (5) isgiven by Y = M − / ( u , . . . , u K ). In addition to the constraints (5), Yan et al. (2007) introduced extra constraints that thecolumn vectors of Y are included in a speciﬁed linear subspace. Let us specify x i ∈ R P , i =1 , . . . , N with some K ≤ P ≤ N . Deﬁne the data matrix X ∈ R N × P by X T = ( x , . . . , x N ).We assume that Y is expressed in the form Y = XA (6)using an arbitrary matrix A ∈ R P × K . Therefore, minimization of (4) is equivalent to ﬁnding A that maximizes tr( A T X T W XA ) subject to the constraints A T X T M XA = I K .For numerical stability, we introduce quadratic regularization terms similar to those ofTakane, Hwang and Abdi (2008). First, we deﬁne two P × P matrices by G = X T M X + γ M L M , H = X T W X + γ W L W . Here γ M , γ W ∈ R are regularization parameters, and L M , L W ∈ R P × P are non-negativedeﬁnite, typically L M = L W = I P . Then, we consider the optimization problem:Maximize tr( A T HA ) with respect to A ∈ R P × K (7)subject to A T GA = I K . (8)This reduces to the problem of Section 2.1 by letting X = I N , γ M = γ W = 0. For thesolution of the optimization problem, we denote G / ∈ R P × P be one of the matricessatisfying ( G / ) T G / = G . The inverse matrix is denoted by G − / = ( G / ) − . Theseare easily computed by, say, Cholesky decomposition or spectral decomposition of symmetricmatrix. The eigenvalues of ( G − / ) T HG − / are λ ≥ λ ≥ · · · ≥ λ P , and the correspondingnormalized eigenvectors are u , u , . . . , u P ∈ R P . The solution of our optimization problemis A = G − / ( u , . . . , u K ) . (9)To see what we are actually solving, let us rewrite the error function (4) with respect to A under the constraints (6) and (8). φ ( A ) = tr( Y T ( M − W ) Y )= tr( A T X T ( M − W ) XA )= tr( A T ( G − H − γ M L M + γ W L W ) A )= K − tr( A T HA ) − tr( A T ( γ M L M − γ W L W ) A )Thus, maximization of (7) subject to (8) is equivalent to minimization of φ ( A ) + tr( A T ( γ M L M − γ W L W ) A ) . (10)For the second term working as a regularization term properly, γ M L M − γ W L W should benonnegative deﬁnite. . SHIMODAIRA/cross-domain matching

3. Cross-domain matching correlation analysis

Now we are back to the cross-domain matching. We deﬁne several matrices for rewriting (1)and (2) in a simple form. The data matrices X d ∈ R n d × p d for domains d = 1 , . . . , D aredeﬁned by ( X d ) T = ( x d , . . . , x dn d ). We put these D matrices in the block diagonal positionsof a N × P matrix to deﬁne a large data matrix X = Diag( X , . . . , X D ) ∈ R N × P . Weconcatenate the transformation matrices to deﬁne A ∈ R P × K as A T = (( A ) T , . . . , ( A D ) T ).The vectors in the common space are also concatenated to deﬁne Y d ∈ R n d × K and Y ∈ R N × K as ( Y d ) T = ( y d , . . . , y dn d ), Y T = (( Y ) T , . . . , ( Y D ) T ). The matching weight matricesare W de = ( w deij ) ∈ R n d × n e for d, e = 1 , . . . , D , and they are placed in a array to deﬁne W = ( W de ) ∈ R N × N .Using these matrices, the transformation (1) is written as (6), and the error function (2)is written as (4) or φ ( A ) of Section 2.2. Adding the regularization term to the error function,the objective function becomes (10), and the solution is (9). Thus, the cross-domain matchingis solved by the single-domain version of the spectral graph embedding. An important pointis that the large data matrix X is expressed as X T = ( ˜ x , . . . , ˜ x n , . . . , ˜ x D , . . . , ˜ x Dn D ) , meaning X is the data matrix consists of the augmented vectors. What we have done is,therefore, interpreted as simply applying the spectral graph embedding of Yan et al. (2007)to the N augmented vectors in R P .It would be better to rewrite the constraints (8) in terms of A , . . . , A D for cross-domainmatching. Notice M = Diag( M , . . . , M D ) with M d = diag(( W d , . . . , W dD ) N ), and so X T M X = Diag(( X ) T M X , . . . , ( X D ) T M D X D ). For simplicity, we assume that theregularization matrix is written as a block diagonal matrix as L M = Diag( L M , . . . , L DM ).Then we have A T GA = D (cid:88) d =1 ( A d ) T (cid:16) ( X d ) T M d X d + γ M L dM (cid:17) A d = I K . This is expressed for the vectors in A d as D (cid:88) d =1 ( a dk ) T (cid:16) ( X d ) T M d X d + γ M L dM (cid:17) a dl = δ kl , k, l = 1 , . . . , K (11)using the Kronecker delta.As a ﬁnal remark of this section, we discuss a coding of matching for further implications.Let E be the number of nonzero elements in the lower triangular part of W . In otherwords, E is the number of edges in the graph. We deﬁne a diagonal matrix ˘ W ∈ R E × E with elements of these nonzero { w deij } . Instead of working on the vertices of the graph, herewe work on the edges of the graph for data analysis. So, the data vector is now coded as ˜ x di + ˜ x ej for the matching weight w deij . We deﬁne the data matrix ˘ X ∈ R E × P by concatenating ˜ x di + ˜ x ej in the same order as ˘ W . Since ˘ X T ˘ W ˘ X = X T M X + X T W X , minimization of(10) is equivalent to maximization of tr( A T ( ˘ X T ˘ W ˘ X + γ M L M + γ W L W ) A ). Therefore, thecross-domain matching is interpreted as a kind of PCA for input patterns coded as ˜ x di + ˜ x ej .Interestingly, this idea is found in one of the classical neural network models. Any part of thememorized vector can be used as a key for recalling the whole vector in the auto-associativecorrelation matrix memory (Kohonen, 1972; Nakano, 1972). This associative memory mayrecall ˜ x di + ˜ x ej for input key either ˜ x di or ˜ x ej . It would be a subject of future research to workon ˜ x di + ˜ x ej + ˜ x fk + · · · for joint associations of three or more vectors. . SHIMODAIRA/cross-domain matching

4. Relations to multiset canonical correlation analysis

In this section, we assume that the numbers of vectors are the same for all domains. Thenthe cross-domain matching reduces to a classical multivariate analysis of statistics. Let n = · · · = n D = n , and N = nD . We assume that the weight matrix is speciﬁed as W de = c de I n using a coeﬃcient c de ≥ d, e = 1 , . . . , D . In this case, the cross-domain matching becomes a version of MCCA, where connections between sets of variablesare speciﬁed by the coeﬃcients c de (Tenenhaus and Tenenhaus, 2011). Another version ofMCCA with all c de = 1 is discussed extensively in Takane, Hwang and Abdi (2008).Here we show how the objective function (7) and the constraints (8) are expressed inthe case of MCCA. Noting that X T W X is an array of ( X d ) T W de X e = c de ( X d ) T X e , d, e = 1 , . . . , D , we havetr( A T HA ) = D (cid:88) d =1 D (cid:88) e =1 c de tr(( A d ) T ( X d ) T X e A e ) + γ W D (cid:88) d =1 tr(( A d ) T L dW A d )= K (cid:88) k =1 (cid:16) D (cid:88) d =1 D (cid:88) e =1 c de ( a dk ) T ( X d ) T X e a ek + D (cid:88) d =1 γ W ( a dk ) T L dW a dk (cid:17) (12)For simplicity, we assumed that the regularization matrix is written as a block diagonalmatrix as L W = Diag( L W , . . . , L DW ). The constraints (8) are expressed as (11) with M d =( (cid:80) De =1 c de ) I n . The constraints correspond to eq. (31) of Takane, Hwang and Abdi (2008)except for a diﬀerence in scaling, when c de = 1, L M = L W , and γ M = Dγ W .Further assume that p = · · · = p D = 1 and P = D . Each X d ∈ R n × is a vector now.( G − / ) T HG − / becomes the sample correlation matrix scaled by the factor D − . Thus,the cross-domain matching is equivalent to PCA.

5. An illustrative numerical example

We look at a very simple example to see how the methods work. We randomly generated adata with D = 3, p = 10 , p = 30 , p = 100, n = 125 , n = 250 , n = 500 in the followingsteps.1. We placed points on 5 × R as (1 , , (1 , , (1 , , (1 , , (1 , , (2 , , . . . , (5 , x ) T , . . . , ( x ) T , where d = 0 is treated as a special domain for datageneration. These 25 values are repeatedly used to deﬁne x i for i = 26 , , . . . .2. We made random matrices B d ∈ R p d × , d = 1 , ,

3, with all elements distributed as N (0 ,

1) independently. Then, we generated data vectors x di = B d x i + (cid:15) di , i = 1 , . . . , n d .Elements of (cid:15) di are distributed as N (0 , . ) independently. Each column of X d isstandardized to mean zero and variance one.3. The numbers of data vectors x di generated from each grid point are 5, 10, 20, respec-tively, for d = 1 , ,

3. For deﬁning underlying true associations, we linked these 35vectors to each other, except for those within a same domain. The true weights forthese 35 vectors are ¯ w deij = 1 for d (cid:54) = e , ¯ w ddij = 0. All other weights across grid pointsare zero. The numbers of nonzero elements (lower triangular) are 1250, 2500, 5000(total 8750), respectively, for ¯ W , ¯ W , ¯ W .4. We made weight matrices W de by randomly sampling 2% of links from ¯ W de . The num-bers of nonzero elements (lower triangular) became 28, 50, 97 (total 175), respectively,for W , W , W . . SHIMODAIRA/cross-domain matching We applied the cross-domain matching with γ W = 0, γ M = 0 . L M = Diag( L M , L M , L M ) with L dM = α d I p d and α d = tr(( X d ) T M d X d ) /p d . The results are shown in Fig. 1 and Fig. 2.Like PCA, we denote PC k for the k -th component of the common space ( y di ) k . Scatterplots of data vectors in the common space are shown in Fig. 1(a) and Fig. 1(b). The 5 × λ k (correspond to thecanonical correlations of CCA) in Fig. 1(c), they are almost 1 for PC1 and PC2, and decreaserapidly for k ≥

3, indicating K = 2 is a good choice. The number of positive λ k is 40(= p + p in this example). We only look at these 40 PC’s, because negative λ k are due tochange of the sign of axes.Picking a data vector ( d = 2 , i = 1) as a query, and look for vectors close to it in thecommon space. This query vector can be treated as a new input, because it was not linkedto any other vectors in W de . In Fig. 1(d), distances to other vectors (cid:107) y di − y (cid:107) , i = 1 , . . . , n d , d = 1 , . . . , D are computed with K = 2. The “true” distances are computed as (cid:107) x i − x (cid:107) .They agree very well, meaning that we will ﬁnd closely related vectors.What happens if we use a wrong K ? Results are shown in Fig. 2. The observed distancesin the common space are disturbed by PC3 in Fig. 2(a). The situation becomes worse inFig. 2(b), and it is not possible to make a reasonable data-retrieval any more. It is veryimportant to choose an appropriate K . K and γ M by cross-validation We write A = A ( W , γ M ) for (9) and φ ( A , W ) for (2) by omitting X from the notation.The error function is decomposed into each PC k as φ ( A , W ) = (cid:80) Kk =1 φ k ( A , W ) with φ k ( A , W ) = 12 D (cid:88) d =1 D (cid:88) e =1 n d (cid:88) i =1 n e (cid:88) j =1 w deij (cid:16) ( y di ) k − ( y ej ) k (cid:17) . The weights W = ( w ij ) are always rescaled to have (cid:80) Ni =1 (cid:80) Nj =1 w ij = 1 in the com-putation of φ k ( · , W ) below. For verifying an appropriate value for K and γ M , the er-ror φ k ( A ( W , γ M ) , ¯ W ) with respect to the true weights ¯ w deij is computed for γ M =0 , . , . , . , . k = 1 ,

2, and it rapidly increases for k ≥

3, conﬁrming K = 2 is the right choice. Also, we conﬁrm that the errors in PC1 andPC2 are minimized when γ M = 0 . w deij are unknown in reality, and we have to compute the error only fromthe observed w deij . However, the ﬁtting error φ k ( A ( W , γ M ) , W ) in Fig. 3(b) does not workwell. The ﬁtting error is minimized when γ M = 0, but prediction of unlinked pairs of vectorsis not good as seen in Fig. 3(c). Another issue we notice in Fig. 3(b) is that the ﬁtting errorfor γ M = 0 is not monotone increasing in PC; this becomes monotone when we rescale PC k by factor ( (cid:80) i ( (cid:80) j w ij )( y i ) k / (cid:80) i (cid:80) j w ij ) − / .For estimating the true error, we then performed cross-validation analysis as follows. 10%of nonzero elements (lower triangular) of W are resampled to make W ∗ . In other words,the elements of W ∗ are deﬁned as w ∗ deij = w deij z ∗ deij ; z ∗ deij are generated by the Bernoullitrial with P ( z ∗ deij = 1) = 0 . P ( z deij = 0) = 0 .

9. The number of nonzero elements (lowertriangular) of W ∗ was 19, and that of the remaining matrix W − W ∗ was 156, from which we . SHIMODAIRA/cross-domain matching computed φ k ( A (( W − W ∗ ) / . , γ M ) , W ∗ ). By repeating this process 30 times, we computedthe average error. This cross-validation error is shown in Fig. 3(d). The plot is very similarto Fig. 3(a), and we successfully choose K = 2, γ M = 0 .

1. In fact, Shimodaira (2015) showedasymptotically as N → ∞ that the cross-validation error unbiasedly estimates the true errorby adjusting the bias of the ﬁtting error. Acknowledgments

I would like to thank Kazuki Fukui and Haruhisa Nagata for helpful discussions.

References

Belkin, M. and

Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction anddata representation.

Neural computation Chung, F. R. (1997).

Spectral graph theory . American Mathematical Soc. Correa, N. M. , Eichele, T. , Adalı, T. , Li, Y.-O. and

Calhoun, V. D. (2010). Multi-set canonical correlation analysis for the fusion of concurrent single trial ERP and func-tional MRI.

Neuroimage Daum´e III, H. (2009). Frustratingly easy domain adaptation. In

Proceedings of the 45thAnnual Meeting of the Association of Computational Linguistics

Gong, Y. , Ke, Q. , Isard, M. and

Lazebnik, S. (2014). A multi-view embedding space formodeling internet images, tags, and their semantics.

International Journal of ComputerVision

Hotelling, H. (1936). Relations between two sets of variates.

Biometrika

Huang, Z. , Shan, S. , Zhang, H. , Lao, S. and

Chen, X. (2013). Cross-view graph em-bedding. In

Computer Vision–ACCV 2012

Kan, M. , Shan, S. , Zhang, H. , Lao, S. and

Chen, X. (2012). Multi-view discriminantanalysis. In

Computer Vision–ECCV 2012

Kettenring, J. R. (1971). Canonical analysis of several sets of variables.

Biometrika Kohonen, T. (1972). Correlation matrix memories.

Computers, IEEE Transactions on

Mikolov, T. , Sutskever, I. , Chen, K. , Corrado, G. S. and

Dean, J. (2013). Dis-tributed representations of words and phrases and their compositionality. In

Advances inNeural Information Processing Systems

Nakano, K. (1972). Associatron – A model of associative memory.

Systems, Man andCybernetics, IEEE Transactions on Ng, A. Y. , Jordan, M. I. , Weiss, Y. et al. (2002). On spectral clustering: Analysis andan algorithm.

Advances in neural information processing systems Olshausen, B. A. and

Field, D. J. (2004). Sparse coding of sensory inputs.

Currentopinion in neurobiology Shi, J. and

Malik, J. (2000). Normalized cuts and image segmentation.

Pattern Analysisand Machine Intelligence, IEEE Transactions on Shi, X. , Liu, Q. , Fan, W. and

Yu, P. S. (2013). Transfer across completely diﬀerent featurespaces via spectral embedding.

Knowledge and Data Engineering, IEEE Transactions on Shimodaira, H. (2015). Cross-validation of matching correlation analysis by resamplingmatching weights. (submitted) . . SHIMODAIRA/cross-domain matching Takane, Y. , Hwang, H. and

Abdi, H. (2008). Regularized multiple-set canonical corre-lation analysis.

Psychometrika Tenenhaus, A. and

Tenenhaus, M. (2011). Regularized generalized canonical correlationanalysis.

Psychometrika Von Luxburg, U. (2007). A tutorial on spectral clustering.

Statistics and computing Wang, K. , He, R. , Wang, W. , Wang, L. and

Tan, T. (2013). Learning coupled featurespaces for cross-modal matching. In

Computer Vision (ICCV), 2013 IEEE InternationalConference on

Yan, S. , Xu, D. , Zhang, B. , Zhang, H.-J. , Yang, Q. and

Lin, S. (2007). Graph embed-ding and extensions: a general framework for dimensionality reduction.

Pattern Analysisand Machine Intelligence, IEEE Transactions on Yuan, Y.-H. and

Sun, Q.-S. (2014). Graph regularized multiset canonical correlationswith applications to joint feature extraction.

Pattern Recognition Yuan, Y.-H. , Sun, Q.-S. , Zhou, Q. and

Xia, D.-S. (2011). A novel multiset integratedcanonical correlation analysis framework and its application in feature fusion.