A simple coding for cross-domain matching with dimension reduction via spectral graph embedding
AA simple coding for cross-domain matchingwith dimension reduction via spectral graphembedding
Hidetoshi Shimodaira ∗ Division of Mathematical ScienceGraduate School of Engineering ScienceOsaka University1-3 Machikaneyama-choToyonaka, Osaka, Japane-mail: [email protected]
Abstract:
Data vectors are obtained from multiple domains. They are feature vectors of im-ages or vector representations of words. Domains may have different numbers of datavectors with different dimensions. These data vectors from multiple domains are pro-jected to a common space by linear transformations in order to search closely relatedvectors across domains. We would like to find projection matrices to minimize distancesbetween closely related data vectors. This formulation of cross-domain matching is re-garded as an extension of the spectral graph embedding to multi-domain setting, andit includes several multivariate analysis methods of statistics such as multiset canon-ical correlation analysis, correspondence analysis, and principal component analysis.Similar approaches are very popular recently in pattern recognition and vision. In thispaper, instead of proposing a novel method, we will introduce an embarrassingly sim-ple idea of coding the data vectors for explaining all the above mentioned approaches.A data vector is concatenated with zero vectors from all other domains to make anaugmented vector. The cross-domain matching is solved by applying the single-domainversion of spectral graph embedding to these augmented vectors of all the domains.An interesting connection to the classical associative memory model of neural networksis also discussed by noticing a coding for association. A cross-validation method forchoosing the dimension of the common space and a regularization parameter will bediscussed in an illustrative numerical example.
Keywords and phrases: multiple domains, common space, matching weight, multi-variate analysis, canonical correlation analysis, spectral graph embedding, associativememory, sparse coding.
1. Introduction
We consider multiple domains for getting data vectors. Let D be the number of domains,and d = 1 , . . . , D denote each domain. For example, d = 1 may be for images, and d = 2for words. From domain d , we get data vectors x di ∈ R p d , i = 1 , . . . , n d , where n d is thenumber of data vectors, and p d is the dimension of the data vector. They may be imagefeature vectors for d = 1, and word vectors computed by word2vec (Mikolov et al., 2013)from texts for d = 2. Typically, p d is hundreds, and n d is thousands to millions. We wouldlike to retrieve relevant words from an image query, and alternatively retrieve images froma word query.We specify the strength of association between two data vectors x di and x ej by a matchingweight w deij ∈ R for d, e = 1 , . . . , D , i = 1 , . . . , n d , j = 1 , . . . , n e . (Note that “matching” here ∗ Supported in part by Grant KAKENHI (24300106, 26120523) from MEXT of Japan.1 a r X i v : . [ s t a t . M L ] M a r . SHIMODAIRA/cross-domain matching is nothing related to that of graph theory.) We assume the weight is symmetric w deij = w edji .For example, w = 3 for the association between an image “apple” ( x ) and word “apple”( x ), and w = 1 for the association between the image “apple” and word “red” ( x ).However, it could be the case that the image apple is unlabeled and w = 0, while the colormay be automatically classified as red and w = 1 remains. Let ¯ w deij be the matching weightrepresenting the underlying true associations, and w deij be observed ones sampled from thetrue associations. We assume w deij = ¯ w deij with a small probability, and w deij = 0 otherwise,so that W de = ( w deij ) ∈ R n d × n e would be a sparse matrix.The data vectors from all the domains will be projected to a single common space of R K for some K >
0. Using a matrix A d ∈ R p d × K , we define a linear transformation by y di = ( A d ) T x di , i = 1 , . . . , n d ; d = 1 , . . . , D. (1)Here T denotes matrix transpose. Later we use matrix notation such as tr() for the matrixtrace, diag() for a diagonal matrix, and Diag() for a block diagonal matrix. Each element of y di ∈ R K is ( y di ) k = ( a dk ) T x di , k = 1 , . . . , K, where a dk ∈ R p d are defined as A d = ( a d , . . . , a dK ). The error function of cross-domainmatching is φ ( A , . . . , A D ) = 12 D (cid:88) d =1 D (cid:88) e =1 n d (cid:88) i =1 n e (cid:88) j =1 w deij (cid:107) y di − y ej (cid:107) , (2)and we would like to find A , . . . , A D that minimize (2) subject to certain constraints. Thisis a supervised learning with the matching weights as training data. It handles the problemof semi-supervised learning and missing observation by simply letting unobserved weightszero. For a new query image, say, the data vector x ∈ R p is transformed to y = ( A ) T x .Then look for points close to y in the collection { y di } . By working on the common spacein this way, we will perform data retrieval across domains and data fusion from multipledomains.This formulation of cross-domain matching is regarded as an extension of the spectralgraph embedding of Yan et al. (2007) to the multi-domain setting, and similar approachesare very popular recently in pattern recognition and vision (Correa et al., 2010; Yuan et al.,2011; Kan et al., 2012; Huang et al., 2013; Shi et al., 2013; Wang et al., 2013; Gong et al.,2014; Yuan and Sun, 2014). In particular, the formulation reduces to a classical multivariateanalysis of statistics, known as the multiset canonical correlation analysis (MCCA) (Ket-tenring, 1971; Takane, Hwang and Abdi, 2008; Tenenhaus and Tenenhaus, 2011) by letting n = n = · · · = n D and connecting all vectors across domains with the same index as w deii (cid:54) = 0 and w deij = 0 for i (cid:54) = j . Class labels are coded by indicator variables (called dummyvariables in statistics) and treated as domains; they appear in canonical discriminant analy-sis and correspondence analysis. The formulation becomes the classical canonical correlationanalysis (CCA) of Hotelling (1936) by further letting D = 2, or it becomes principal com-ponent analysis (PCA) by letting p = p = · · · = p D = 1.In this paper, we do not intend to propose a novel method. Instead, we will introduce anembarrassingly simple idea of coding the data vectors for explaining all the above mentionedapproaches. This coding is similar to that of Daum´e III (2009). Let P = (cid:80) Dd =1 p d and N = (cid:80) Dd =1 n d . The data vector x di is coded as an augmented vector ˜ x di ∈ R P defined as( ˜ x di ) T = (cid:16) ( p ) T , . . . , ( p d − ) T , ( x di ) T , ( p d +1 ) T , . . . , ( p D ) T (cid:17) . (3) . SHIMODAIRA/cross-domain matching Here, p ∈ R p is the vector with zero elements. This is a sparse coding (Olshausen andField, 2004) in the sense that nonzero elements for domains do not overlap each other. Allthe N vectors of all domains are now represented as points in the same R P . We will getthe solution of the optimization problem of (2) by applying the single-domain version of thespectral graph embedding of Yan et al. (2007) to these ˜ x di vectors.In Section 2, we will review the spectral graph embedding methods. In Section 3, we willshow that the coding (3) solves the minimization of (2). An interesting connection to theclassical associative memory model of neural networks (Kohonen, 1972; Nakano, 1972) is alsodiscussed there by noticing that coding ˜ x di + ˜ x ej corresponds to the matching w deij . In Section 4,the relations to the multivariate analysis methods are explained. In Section 5, we show anillustrative numerical example of cross-domain matching. In particular, we discuss a cross-validation method for choosing the dimension K of the common space and a regularizationparameter; we resample the matching weights w deij instead of data vectors x di there.
2. A brief review of the spectral graph embedding
Before discussing the cross-domain matching, here we review the spectral graph theory(Chung, 1997). We then consider extra constraints in Section 2.2. This result will be usedfor the cross-domain matching in Section 3. The following argument is based on the spectralclustering, in particular the normalized graph Laplacian (Shi and Malik, 2000; Ng et al.,2002; Von Luxburg, 2007) and the spectral embedding (Belkin and Niyogi, 2003).Let
N > y i ∈ R K , i = 1 , . . . , N of dimension K ≤ N . The weighted adjacency matrix is W = ( w ij ) ∈ R N × N with symmetric weights w ij = w ji ≥ i, j = 1 , . . . , N . Let M =diag( W N ) ∈ R N × N be the diagonal matrix with elements (cid:80) Nj =1 w ij , i = 1 , . . . , N . Here N ∈ R N denotes the vector with all elements being 1. The graph Laplacian is M − W .For a given W , we would like to find y , . . . , y N that minimize the error function φ ( y , . . . , y N ) = 12 N (cid:88) i =1 N (cid:88) j =1 w ij (cid:107) y i − y j (cid:107) (4)subject to certain constraints. For avoiding the trivial solution of all zero vectors, we assumethe constraints Y T M Y = I K , (5)where Y ∈ R N × K is defined by Y T = ( y , . . . , y N ), and I K ∈ R K × K is the identity matrix.By simple rearrangement of the formula, we gettr( Y T W Y ) = N (cid:88) i =1 N (cid:88) j =1 w ij y Ti y j , tr( Y T M Y ) = N (cid:88) i =1 (cid:16) N (cid:88) j =1 w ij (cid:17) (cid:107) y i (cid:107) . Thus the error function is rewritten as φ ( y , . . . , y N ) = 12 N (cid:88) i =1 N (cid:88) j =1 w ij (cid:16) (cid:107) y i (cid:107) + (cid:107) y j (cid:107) − y Ti y j (cid:17) = tr( Y T ( M − W ) Y )= K − tr( Y T W Y ) . . SHIMODAIRA/cross-domain matching Therefore, minimization of (4) is equivalent to maximization of tr( Y T W Y ).Let M − / ∈ R N × N be the diagonal matrix with elements ( (cid:80) Nj =1 w ij ) − / , i = 1 , . . . , N .The eigenvalues of M − / W M − / are λ ≥ λ ≥ · · · ≥ λ N and the corresponding nor-malized eigenvectors are u , . . . , u N ∈ R N . The solution of minimizing (4) subject to (5) isgiven by Y = M − / ( u , . . . , u K ). In addition to the constraints (5), Yan et al. (2007) introduced extra constraints that thecolumn vectors of Y are included in a specified linear subspace. Let us specify x i ∈ R P , i =1 , . . . , N with some K ≤ P ≤ N . Define the data matrix X ∈ R N × P by X T = ( x , . . . , x N ).We assume that Y is expressed in the form Y = XA (6)using an arbitrary matrix A ∈ R P × K . Therefore, minimization of (4) is equivalent to finding A that maximizes tr( A T X T W XA ) subject to the constraints A T X T M XA = I K .For numerical stability, we introduce quadratic regularization terms similar to those ofTakane, Hwang and Abdi (2008). First, we define two P × P matrices by G = X T M X + γ M L M , H = X T W X + γ W L W . Here γ M , γ W ∈ R are regularization parameters, and L M , L W ∈ R P × P are non-negativedefinite, typically L M = L W = I P . Then, we consider the optimization problem:Maximize tr( A T HA ) with respect to A ∈ R P × K (7)subject to A T GA = I K . (8)This reduces to the problem of Section 2.1 by letting X = I N , γ M = γ W = 0. For thesolution of the optimization problem, we denote G / ∈ R P × P be one of the matricessatisfying ( G / ) T G / = G . The inverse matrix is denoted by G − / = ( G / ) − . Theseare easily computed by, say, Cholesky decomposition or spectral decomposition of symmetricmatrix. The eigenvalues of ( G − / ) T HG − / are λ ≥ λ ≥ · · · ≥ λ P , and the correspondingnormalized eigenvectors are u , u , . . . , u P ∈ R P . The solution of our optimization problemis A = G − / ( u , . . . , u K ) . (9)To see what we are actually solving, let us rewrite the error function (4) with respect to A under the constraints (6) and (8). φ ( A ) = tr( Y T ( M − W ) Y )= tr( A T X T ( M − W ) XA )= tr( A T ( G − H − γ M L M + γ W L W ) A )= K − tr( A T HA ) − tr( A T ( γ M L M − γ W L W ) A )Thus, maximization of (7) subject to (8) is equivalent to minimization of φ ( A ) + tr( A T ( γ M L M − γ W L W ) A ) . (10)For the second term working as a regularization term properly, γ M L M − γ W L W should benonnegative definite. . SHIMODAIRA/cross-domain matching
3. Cross-domain matching correlation analysis
Now we are back to the cross-domain matching. We define several matrices for rewriting (1)and (2) in a simple form. The data matrices X d ∈ R n d × p d for domains d = 1 , . . . , D aredefined by ( X d ) T = ( x d , . . . , x dn d ). We put these D matrices in the block diagonal positionsof a N × P matrix to define a large data matrix X = Diag( X , . . . , X D ) ∈ R N × P . Weconcatenate the transformation matrices to define A ∈ R P × K as A T = (( A ) T , . . . , ( A D ) T ).The vectors in the common space are also concatenated to define Y d ∈ R n d × K and Y ∈ R N × K as ( Y d ) T = ( y d , . . . , y dn d ), Y T = (( Y ) T , . . . , ( Y D ) T ). The matching weight matricesare W de = ( w deij ) ∈ R n d × n e for d, e = 1 , . . . , D , and they are placed in a array to define W = ( W de ) ∈ R N × N .Using these matrices, the transformation (1) is written as (6), and the error function (2)is written as (4) or φ ( A ) of Section 2.2. Adding the regularization term to the error function,the objective function becomes (10), and the solution is (9). Thus, the cross-domain matchingis solved by the single-domain version of the spectral graph embedding. An important pointis that the large data matrix X is expressed as X T = ( ˜ x , . . . , ˜ x n , . . . , ˜ x D , . . . , ˜ x Dn D ) , meaning X is the data matrix consists of the augmented vectors. What we have done is,therefore, interpreted as simply applying the spectral graph embedding of Yan et al. (2007)to the N augmented vectors in R P .It would be better to rewrite the constraints (8) in terms of A , . . . , A D for cross-domainmatching. Notice M = Diag( M , . . . , M D ) with M d = diag(( W d , . . . , W dD ) N ), and so X T M X = Diag(( X ) T M X , . . . , ( X D ) T M D X D ). For simplicity, we assume that theregularization matrix is written as a block diagonal matrix as L M = Diag( L M , . . . , L DM ).Then we have A T GA = D (cid:88) d =1 ( A d ) T (cid:16) ( X d ) T M d X d + γ M L dM (cid:17) A d = I K . This is expressed for the vectors in A d as D (cid:88) d =1 ( a dk ) T (cid:16) ( X d ) T M d X d + γ M L dM (cid:17) a dl = δ kl , k, l = 1 , . . . , K (11)using the Kronecker delta.As a final remark of this section, we discuss a coding of matching for further implications.Let E be the number of nonzero elements in the lower triangular part of W . In otherwords, E is the number of edges in the graph. We define a diagonal matrix ˘ W ∈ R E × E with elements of these nonzero { w deij } . Instead of working on the vertices of the graph, herewe work on the edges of the graph for data analysis. So, the data vector is now coded as ˜ x di + ˜ x ej for the matching weight w deij . We define the data matrix ˘ X ∈ R E × P by concatenating ˜ x di + ˜ x ej in the same order as ˘ W . Since ˘ X T ˘ W ˘ X = X T M X + X T W X , minimization of(10) is equivalent to maximization of tr( A T ( ˘ X T ˘ W ˘ X + γ M L M + γ W L W ) A ). Therefore, thecross-domain matching is interpreted as a kind of PCA for input patterns coded as ˜ x di + ˜ x ej .Interestingly, this idea is found in one of the classical neural network models. Any part of thememorized vector can be used as a key for recalling the whole vector in the auto-associativecorrelation matrix memory (Kohonen, 1972; Nakano, 1972). This associative memory mayrecall ˜ x di + ˜ x ej for input key either ˜ x di or ˜ x ej . It would be a subject of future research to workon ˜ x di + ˜ x ej + ˜ x fk + · · · for joint associations of three or more vectors. . SHIMODAIRA/cross-domain matching
4. Relations to multiset canonical correlation analysis
In this section, we assume that the numbers of vectors are the same for all domains. Thenthe cross-domain matching reduces to a classical multivariate analysis of statistics. Let n = · · · = n D = n , and N = nD . We assume that the weight matrix is specified as W de = c de I n using a coefficient c de ≥ d, e = 1 , . . . , D . In this case, the cross-domain matching becomes a version of MCCA, where connections between sets of variablesare specified by the coefficients c de (Tenenhaus and Tenenhaus, 2011). Another version ofMCCA with all c de = 1 is discussed extensively in Takane, Hwang and Abdi (2008).Here we show how the objective function (7) and the constraints (8) are expressed inthe case of MCCA. Noting that X T W X is an array of ( X d ) T W de X e = c de ( X d ) T X e , d, e = 1 , . . . , D , we havetr( A T HA ) = D (cid:88) d =1 D (cid:88) e =1 c de tr(( A d ) T ( X d ) T X e A e ) + γ W D (cid:88) d =1 tr(( A d ) T L dW A d )= K (cid:88) k =1 (cid:16) D (cid:88) d =1 D (cid:88) e =1 c de ( a dk ) T ( X d ) T X e a ek + D (cid:88) d =1 γ W ( a dk ) T L dW a dk (cid:17) (12)For simplicity, we assumed that the regularization matrix is written as a block diagonalmatrix as L W = Diag( L W , . . . , L DW ). The constraints (8) are expressed as (11) with M d =( (cid:80) De =1 c de ) I n . The constraints correspond to eq. (31) of Takane, Hwang and Abdi (2008)except for a difference in scaling, when c de = 1, L M = L W , and γ M = Dγ W .Further assume that p = · · · = p D = 1 and P = D . Each X d ∈ R n × is a vector now.( G − / ) T HG − / becomes the sample correlation matrix scaled by the factor D − . Thus,the cross-domain matching is equivalent to PCA.
5. An illustrative numerical example
We look at a very simple example to see how the methods work. We randomly generated adata with D = 3, p = 10 , p = 30 , p = 100, n = 125 , n = 250 , n = 500 in the followingsteps.1. We placed points on 5 × R as (1 , , (1 , , (1 , , (1 , , (1 , , (2 , , . . . , (5 , x ) T , . . . , ( x ) T , where d = 0 is treated as a special domain for datageneration. These 25 values are repeatedly used to define x i for i = 26 , , . . . .2. We made random matrices B d ∈ R p d × , d = 1 , ,
3, with all elements distributed as N (0 ,
1) independently. Then, we generated data vectors x di = B d x i + (cid:15) di , i = 1 , . . . , n d .Elements of (cid:15) di are distributed as N (0 , . ) independently. Each column of X d isstandardized to mean zero and variance one.3. The numbers of data vectors x di generated from each grid point are 5, 10, 20, respec-tively, for d = 1 , ,
3. For defining underlying true associations, we linked these 35vectors to each other, except for those within a same domain. The true weights forthese 35 vectors are ¯ w deij = 1 for d (cid:54) = e , ¯ w ddij = 0. All other weights across grid pointsare zero. The numbers of nonzero elements (lower triangular) are 1250, 2500, 5000(total 8750), respectively, for ¯ W , ¯ W , ¯ W .4. We made weight matrices W de by randomly sampling 2% of links from ¯ W de . The num-bers of nonzero elements (lower triangular) became 28, 50, 97 (total 175), respectively,for W , W , W . . SHIMODAIRA/cross-domain matching We applied the cross-domain matching with γ W = 0, γ M = 0 . L M = Diag( L M , L M , L M ) with L dM = α d I p d and α d = tr(( X d ) T M d X d ) /p d . The results are shown in Fig. 1 and Fig. 2.Like PCA, we denote PC k for the k -th component of the common space ( y di ) k . Scatterplots of data vectors in the common space are shown in Fig. 1(a) and Fig. 1(b). The 5 × λ k (correspond to thecanonical correlations of CCA) in Fig. 1(c), they are almost 1 for PC1 and PC2, and decreaserapidly for k ≥
3, indicating K = 2 is a good choice. The number of positive λ k is 40(= p + p in this example). We only look at these 40 PC’s, because negative λ k are due tochange of the sign of axes.Picking a data vector ( d = 2 , i = 1) as a query, and look for vectors close to it in thecommon space. This query vector can be treated as a new input, because it was not linkedto any other vectors in W de . In Fig. 1(d), distances to other vectors (cid:107) y di − y (cid:107) , i = 1 , . . . , n d , d = 1 , . . . , D are computed with K = 2. The “true” distances are computed as (cid:107) x i − x (cid:107) .They agree very well, meaning that we will find closely related vectors.What happens if we use a wrong K ? Results are shown in Fig. 2. The observed distancesin the common space are disturbed by PC3 in Fig. 2(a). The situation becomes worse inFig. 2(b), and it is not possible to make a reasonable data-retrieval any more. It is veryimportant to choose an appropriate K . K and γ M by cross-validation We write A = A ( W , γ M ) for (9) and φ ( A , W ) for (2) by omitting X from the notation.The error function is decomposed into each PC k as φ ( A , W ) = (cid:80) Kk =1 φ k ( A , W ) with φ k ( A , W ) = 12 D (cid:88) d =1 D (cid:88) e =1 n d (cid:88) i =1 n e (cid:88) j =1 w deij (cid:16) ( y di ) k − ( y ej ) k (cid:17) . The weights W = ( w ij ) are always rescaled to have (cid:80) Ni =1 (cid:80) Nj =1 w ij = 1 in the com-putation of φ k ( · , W ) below. For verifying an appropriate value for K and γ M , the er-ror φ k ( A ( W , γ M ) , ¯ W ) with respect to the true weights ¯ w deij is computed for γ M =0 , . , . , . , . k = 1 ,
2, and it rapidly increases for k ≥
3, confirming K = 2 is the right choice. Also, we confirm that the errors in PC1 andPC2 are minimized when γ M = 0 . w deij are unknown in reality, and we have to compute the error only fromthe observed w deij . However, the fitting error φ k ( A ( W , γ M ) , W ) in Fig. 3(b) does not workwell. The fitting error is minimized when γ M = 0, but prediction of unlinked pairs of vectorsis not good as seen in Fig. 3(c). Another issue we notice in Fig. 3(b) is that the fitting errorfor γ M = 0 is not monotone increasing in PC; this becomes monotone when we rescale PC k by factor ( (cid:80) i ( (cid:80) j w ij )( y i ) k / (cid:80) i (cid:80) j w ij ) − / .For estimating the true error, we then performed cross-validation analysis as follows. 10%of nonzero elements (lower triangular) of W are resampled to make W ∗ . In other words,the elements of W ∗ are defined as w ∗ deij = w deij z ∗ deij ; z ∗ deij are generated by the Bernoullitrial with P ( z ∗ deij = 1) = 0 . P ( z deij = 0) = 0 .
9. The number of nonzero elements (lowertriangular) of W ∗ was 19, and that of the remaining matrix W − W ∗ was 156, from which we . SHIMODAIRA/cross-domain matching computed φ k ( A (( W − W ∗ ) / . , γ M ) , W ∗ ). By repeating this process 30 times, we computedthe average error. This cross-validation error is shown in Fig. 3(d). The plot is very similarto Fig. 3(a), and we successfully choose K = 2, γ M = 0 .
1. In fact, Shimodaira (2015) showedasymptotically as N → ∞ that the cross-validation error unbiasedly estimates the true errorby adjusting the bias of the fitting error. Acknowledgments
I would like to thank Kazuki Fukui and Haruhisa Nagata for helpful discussions.
References
Belkin, M. and
Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction anddata representation.
Neural computation Chung, F. R. (1997).
Spectral graph theory . American Mathematical Soc. Correa, N. M. , Eichele, T. , Adalı, T. , Li, Y.-O. and
Calhoun, V. D. (2010). Multi-set canonical correlation analysis for the fusion of concurrent single trial ERP and func-tional MRI.
Neuroimage Daum´e III, H. (2009). Frustratingly easy domain adaptation. In
Proceedings of the 45thAnnual Meeting of the Association of Computational Linguistics
Gong, Y. , Ke, Q. , Isard, M. and
Lazebnik, S. (2014). A multi-view embedding space formodeling internet images, tags, and their semantics.
International Journal of ComputerVision
Hotelling, H. (1936). Relations between two sets of variates.
Biometrika
Huang, Z. , Shan, S. , Zhang, H. , Lao, S. and
Chen, X. (2013). Cross-view graph em-bedding. In
Computer Vision–ACCV 2012
Kan, M. , Shan, S. , Zhang, H. , Lao, S. and
Chen, X. (2012). Multi-view discriminantanalysis. In
Computer Vision–ECCV 2012
Kettenring, J. R. (1971). Canonical analysis of several sets of variables.
Biometrika Kohonen, T. (1972). Correlation matrix memories.
Computers, IEEE Transactions on
Mikolov, T. , Sutskever, I. , Chen, K. , Corrado, G. S. and
Dean, J. (2013). Dis-tributed representations of words and phrases and their compositionality. In
Advances inNeural Information Processing Systems
Nakano, K. (1972). Associatron – A model of associative memory.
Systems, Man andCybernetics, IEEE Transactions on Ng, A. Y. , Jordan, M. I. , Weiss, Y. et al. (2002). On spectral clustering: Analysis andan algorithm.
Advances in neural information processing systems Olshausen, B. A. and
Field, D. J. (2004). Sparse coding of sensory inputs.
Currentopinion in neurobiology Shi, J. and
Malik, J. (2000). Normalized cuts and image segmentation.
Pattern Analysisand Machine Intelligence, IEEE Transactions on Shi, X. , Liu, Q. , Fan, W. and
Yu, P. S. (2013). Transfer across completely different featurespaces via spectral embedding.
Knowledge and Data Engineering, IEEE Transactions on Shimodaira, H. (2015). Cross-validation of matching correlation analysis by resamplingmatching weights. (submitted) . . SHIMODAIRA/cross-domain matching Takane, Y. , Hwang, H. and
Abdi, H. (2008). Regularized multiple-set canonical corre-lation analysis.
Psychometrika Tenenhaus, A. and
Tenenhaus, M. (2011). Regularized generalized canonical correlationanalysis.
Psychometrika Von Luxburg, U. (2007). A tutorial on spectral clustering.
Statistics and computing Wang, K. , He, R. , Wang, W. , Wang, L. and
Tan, T. (2013). Learning coupled featurespaces for cross-modal matching. In
Computer Vision (ICCV), 2013 IEEE InternationalConference on
Yan, S. , Xu, D. , Zhang, B. , Zhang, H.-J. , Yang, Q. and
Lin, S. (2007). Graph embed-ding and extensions: a general framework for dimensionality reduction.
Pattern Analysisand Machine Intelligence, IEEE Transactions on Yuan, Y.-H. and
Sun, Q.-S. (2014). Graph regularized multiset canonical correlationswith applications to joint feature extraction.
Pattern Recognition Yuan, Y.-H. , Sun, Q.-S. , Zhou, Q. and
Xia, D.-S. (2011). A novel multiset integratedcanonical correlation analysis framework and its application in feature fusion.