Schroedinger Eigenmaps for the Analysis of Bio-Medical Data
aa r X i v : . [ c s . C E ] S e p Schroedinger Eigenmaps for the Analysis ofBio-Medical Data
Wojciech Czaja and Martin Ehler
Abstract —We introduce Schroedinger Eigenmaps, a newsemi-supervised manifold learning and recovery technique.This method is based on an implementation of graphSchroedinger operators with appropriately constructed bar-rier potentials as carriers of labeled information. We use ourapproach for the analysis of standard bio-medical datasetsand new multispectral retinal images.
Index Terms —Schroedinger Eigenmaps, Laplacian Eigen-maps, Schroedinger operator on a graph, barrier potential,dimension reduction, manifold learning.
I. I
NTRODUCTION
Typical modern bio-medical data are characterized byhigh complexity (usually by means of high dimensionality),and the underlying nonlinear processes are often unknown,forming a fundamental barrier to better understand humanphysiology. Moreover, as personalized medicine expands,increasingly detailed patient data need to be integratedin clinical trials and individual treatment decisions, whichusually results in a complicated classification problem.Data driven analysis schemes are commonly based onkernels that derive the proper classification directly from thedata. However, bio-medical applications are also character-ized by low signal to noise ratio so that effective schemesrequire additional expert input. This outside informationis, depending on the perspective, called labeling, (semi-)supervision, regularization, or learning from training data.Often, because of the nature of medical experiments, thereare only few training points available and analysis schemesare needed that learn from these few data points mostefficiently, while being robust against perturbations.Examples of kernel based analysis techniques are LocallyLinear Embedding (LLE) [37], Hessian-based LLE [23],Laplacian Eigenmaps [4], Diffusion Wavelets [17], [18], orDiffusion Maps [16], and kernel PCA and Support VectorMachines [39]. These methods represent the data in formof a graph, with nodes that are formed by the data vectors,and with edges that represent the distances between pairsof such vectors. This information is stored in the adjacencymatrix, which then is modified to form the kernel.
W. Czaja is with University of Maryland, Department of Mathematics,College Park, MD 20742, [email protected]. Ehler is with Helmholtz Zentrum M¨unchen, German Research Cen-ter for Environmental Health, Institute of Biomathematics and Biometry,Ingolst¨adter Landstr. 1, D-85764 Neuherberg, [email protected]. He is also with National Institutes of Health, EuniceKennedy Shriver National Institute of Child Health and Human Develop-ment, Section on Medical Biophysics, 9 Memorial Drive, Bethesda, MD20892, [email protected].
Physical or experimental constraints often suggest thatthe data points belong to a low-dimensional manifold. Manykernel-based methods recover this manifold by means ofrepresentations in terms of the most significant eigenvectorsof the kernel matrix [42]. A major feature of LaplacianEigenmaps and related methods is the preservation of localspectral distances, while reducing the overall dimension ofthe data represented by the aforementioned graph, cf., [15].It has been successfully applied to pattern recognition andsegmentation problems [40], [46] and [2], [8].Laplacian Eigenmaps and similar methods originally leadto fully automated classification algorithms but have alsobeen modified to work in semi-supervised settings: Belkinand Niyogi [3], [5], for instance, proposed to use classi-fiers induced from partially labelled data on the Laplacianrepresentation of unlabeled points. Belkin et al. [7] usedregularization in reproducing kernel Hilbert spaces to obtaina family of semi-supervised and transductive learning algo-rithms related to Laplacian Eigenmaps. In [35], extensionsof the regularization methods proposed in [7] are obtainedvia inverse regression. In related developments, Zhu etal. [49] proposed to augment the data graph with theresults of a prior classifier. Local discriminant analysis tolocally modify (“warp”) the diffusion metric is proposedby Coifman et al. in [17]. Szlam et al. [41] developed afurther generalization by means of collecting all the classinformation into a modification of the metric used to buildthe kernel.In the present paper we develop a new
SchroedingerEigenmaps algorithm that builds upon Laplacian Eigen-maps and steers the diffusion process within the uni-form theory of Schroedinger operators in place of theLaplacian. Experts (e.g., trained practitioners) can intro-duce their input in form of a barrier potential for theassociated Schroedinger operator on a graph to improvethe detection and classification processes by steering theSchroedinger diffusion process effectively. At the sametime, the new graph Schroedinger operator converges to-wards the Schroedinger operator on the manifold (justlike the Laplacian kernel converges towards the Laplace-Beltrami operator), see Section III-B.Such defined barrier Schroedinger potentials are easy toimplement and appear in our numerical experiments to bewell-suited for the analysis of low quality data with onlyfew training points available – a typical situation in bio-medical applications. After having validated the usefulnessof Schroedinger Eigenmaps (SE) by comparison to SupportVector Machines (SVM) on three standard medical datasets, we apply our method to analyze multispectral retinal im-ages and genetic expression profiles. The proposed schemeprovides two major features for data segmentation. On onehand, by an appropriate selection of the potential, it allowsthe practitioner to separate data vectors, which shouldnot appear in the same class. Specifically, in bio-medicalimaging, the practitioner could be a physician who has addi-tional information from results of other imaging devices orfrom physiological processes and knowledge about diseaseprogression. On the other hand, Schroedinger Eigenmapsallow to identify complex regions by means of decreasingthe relative distances between labeled representatives. Forthe physician who has already identified several differentregions of pathology, this technique allows to collect theseregions into one pathology class and therefore allows tobetter indentify normal physiology.In Section II we review Laplacian Eigenmaps. We intro-duce the new Schroedinger Eigenmaps with an investigationof its properties in Section III, where we also visualize thebehavior on artificial data. Section III-E is dedicated to thedescription of the intertwining of Schroedinger Eigenmapsand classification schemes. In Section IV, we compareSchroedinger Eigenmaps with Support Vector Machineson standard medical datasets. We also analyze newmultispectral retinal images.II. L APLACIAN E IGENMAPS
Given m points { x , . . . , x m } ⊂ R N , we assume thatthey belong to an n -dimensional manifold, where n is muchsmaller than N . The goal is to recover this manifold or,in other words, to find a low-dimensional representation { y , . . . , y m } ⊂ R n of the original dataset { x , . . . , x m } ⊂ R N . Here, we briefly recall the three-step procedure ofLaplacian Eigenmaps, see [4]: Step 1.
Adjacency graph construction:
Build a graph G , whose nodes i and j are connected if x i is among the k -nearest neighbors of x j or vice versa.The distances between data points are measured in theEuclidean metric. The graph G represents the connectivityof the data, and k is a parameter. Step 2.
Heat kernel as weights:
Weight the edges of the graph. A typical example is the diffusion weight matrix W given by W i,j = ( e − k xi − xj k σ , i and j are connected , , otherwise. Step 3.
Solving an eigenvalue problem:
Let D be a diagonal matrix with D i,i = P j W i,j . Denotethe low-dimensional representation by an m × n matrix y = ( y , . . . , y m ) ⊤ , where each row y i is considered asa vector in R n . Next, consider the following minimizationproblem min y ⊤ Dy = I X i,j k y i − y j k W i,j = min y ⊤ Dy = I trace( y ⊤ Ly ) , (1)where L = D − W is the m × m Laplace operator and I is the identity matrix. The minimizer of (1) is given by the n minimal eigenvalue solutions of Lx = λDx under the constraint y ⊤ Dy = I , i.e., the columns ofthe minimizer y (0) are the n eigenvectors with respect tothe smallest eigenvalues. If the graph is connected, then = (1 , . . . , ⊤ is the only eigenvector with eigenvalue . Instead of (1), we solve for min trace( y ⊤ Ly ) subject to y ⊤ Dy = I , y ⊤ D / = 0 . By applying z = D / y , thisyields min trace( z ⊤ L z ) subject to z ⊤ z = I , z ⊤ = 0 ,where L = D − / LD − / . The minimizer z (0) is givenby the n eigenvectors with smallest nonzero eigenvalue,and we obtain the sought n -dimensional representation { y , . . . , y m } from y = D − / z (0) .Though algorithms based on spectral decomposition,such as Laplacian Eigenmaps, provide a powerful tool fornon-linear dimensionality reduction and manifold learning,they are not without certain shortcomings. We refer theinterested reader to, e.g., [28], [29], [43].III. S CHROEDINGER E IGENMAPS
The idea of using solutions of the Laplace equation forlabeling is not new. In [30], for instance, the concept isutilized for image segmentation, where a small numberof pixels is labeled, and random walk theory is thenapplied to classify unlabeled surrounding pixels. We aimto develop a more general classification framework bymeans of Schroedinger operators. This approach allowsfor more flexible labeling schemes. Moreover, a significantadvantage of the Schroedinger Eigenmaps technique isthat it applies not just to surrounding pixels, but to anyimage location allowing for classes whose members are notspatially connected. This more general approach enablesits application to data that do not encode location as, forinstance, gene expressions.The motivation behind this approach is two-fold. Math-ematically, we are interested in shifting the analysis ofcomplex, high-dimensional phenomena towards studyingthe associated operators acting on these data-dependentgraphs, and away from analyzing pointwise relationshipsbetween the graph nodes. Physically, partially labeled setsnaturally lead to the notion of barrier potentials and to theway in which these potentials affect the diffusion processeson graphs. As such, by appropriately choosing locations ofthe potential barriers we can steer the diffusion process toallow for identification of the correct cluster containing thelabels, when compared to unobstructed diffusion process ofthe Laplacian Eigenmaps algorithm.We would like to point out that [1] and [11] dealwith Schroedinger equations with Hamiltonians, which areLaplacians without any potential. We, on the other hand,deal with Schroedinger operators, which are Laplacianswith barrier potentials; as such, these are two differentgeneralizations of the Laplace operator approach.
A. From Laplace to Schroedinger
The matrices L and L in the previous section can beassociated with the notion of the Laplace operator ∆ , cf. [4]. The related diffusion equation ∂ t ϕ = ∆ ϕ isextended to the Schroedinger Equation ∂ t ψ ( x, t ) = ∆ ψ ( x, t ) + v ( x ) ψ ( x, t ) (2)by adding a potential term v ( x ) to the Laplace operator.We revisit this idea in the following section to introduce adiscrete Schroedinger operator. B. Schroedinger Eigenmaps
The potential v in (2) is considered as a nonnegativemultiplier operator. The discrete analogue of E = ∆ + v isthe matrix E = L + V , where V is a nonnegative diagonal m × m potential matrix. We replace (1) with min y ⊤ Dy = I trace( y ⊤ ( L + αV ) y ) , (3)which is equivalent to min y ⊤ Dy = I [trace( y ⊤ Ly ) + trace( y ⊤ αV y )] . The parameter α ≥ is added here so that it can be used toemphasize the relative significance of the potential V withrespect to the graph Laplace operator.According to (1), the minimization problem (3) is equiv-alent to min y ⊤ Dy = I X i,j k y i − y j k W i,j + α X i V ( i ) k y i k , (4)where V = diag( V (1) , . . . , V ( m )) . The first component ofthe above sum incurs a penalty, when neighboring points x i and x j are mapped into points y i and y j , respectively, whichare far apart. The second component penalizes these points y i , i = 1 , . . . , m , which are associated with large values of V ( i ) . Alternatively speaking, if V took only two values, and , then the minimization (4) yields a dimension-reduced representation y , which forces increased clusteringof the representations y i of points associated with the value V ( i ) = 1 , while attempting to ensure that close pointsremain close after the dimension reduction. As such we mayutilize the potential V to label points which we want to beidentified together after the dimension reduction. Becauseof the built-in preservation of topology of the point cloud(induced by the Laplacian), this labeling may be used tosegment a particular class of points. We shall make thisintuition clear in the following sections, see, e.g., CorollaryIII.2 and Corollary III.4, or the applications in Section IV.It is known that the (rescaled) graph Laplacian convergestowards the Laplace-Beltrami operator on the underlyingmanifold, e.g., [6], [44]. This convergence carries overto the Schroedinger operator as we shall see next. TheLaplacian map is given by L σm f ( x i ) = f ( x i ) X j e − k xi − xj k σ − X j f ( x j ) e − k xi − xj k σ . For the purpose of this analysis we choose k = m , i.e., weassume that all data points are connected. Replacing the x i by an arbitrary x ∈ R n yields the extension to R d : L σm f ( x ) = f ( x ) X j e − k x − xj k σ − X j f ( x j ) e − k x − xj k σ . It turns out that there exists a positive constant C such thatfor i.i.d. uniformly sampled data points { x , . . . , x m } and σ m = 4 m − n +2+ s , where s > , and f ∈ C ∞ ( M ) , we havethe convergence lim m →∞ C ( πσ m ) − n +22 m L σ m m f ( x ) = ∆ M f ( x ) (5)in probability, cf. [6]. Let v be a given potential on themanifold M . The associated matrix V acting on a discrete m -point cloud is defined as V = diag( v ( x ) , . . . , v ( x m )) .Since the potential does not depend on σ , we may replace x i by an arbitrary x ∈ R n to obtain the map: V m f ( x ) = v ( x ) f ( x ) . Clearly this extension coincides with the continuous poten-tial on the manifold. As such, adding the discrete potential V m to the discrete Laplacian does not impede the conver-gence in (5). Consequently, the term C ( πσ m ) − n +22 m L σ m m f ( x ) + V m f ( x ) (6)converges for n → ∞ towards ∆ M f ( x ) + v ( x ) f ( x ) in probability. For more results on convergence of theSchroedinger Eigenmaps algorithm to Schroedinger oper-ators on manifolds, we refer the interested reader to [33]and [20].We close this section by noting that expression (6)induces a specific choice of the parameter α . Indeed, inorder to consider E = L + αV , rescaling of L σ m m in (6)implies the need to reversely rescale V . This can be doneby means of multiplication with α = C m ( πσ m ) − n +22 , whichtends to infinity as m → ∞ . As such, optimal selection of α is related to the size of the dataset. C. General Properties
In this section we study the properties of theSchroedinger Eigenmaps (SE). We shall see that the po-tential can be used to push vectors towards zero, whichcan be helpful in classification tasks. In fact, we shall seelater that even nondiagonal “potentials” make sense and,thus, address a more general setting where E = L + αV issymmetric positive semi-definite. Note that E can still bediagonalized and all eigenvalues are nonnegative. If V = 0 is a nonnegative diagonal potential, then null( V ) . Fora connected graph G , the latter implies that E = L + αV has full rank, i.e., there is no eigenvalue that equals zero.To study the effect of the potential V on eigenvectorsof the Schroedinger operator, we make use of the notation k y k M := trace( y ⊤ M y ) for a matrix M and we denote V := D − / V D − / : Theorem III.1.
Let V be symmetric positive semi-definite, n ≤ dim(null( V )) , and let y ( α ) be a minimizer of (3) . a) There is a constant C ≥ such that k y ( α ) k V ≤ C α . (7) b) If z ( α )null is the columnwise orthogonal projection of z ( α ) = D / y ( α ) onto null( V ) = { } , then there isa constant C ≥ such that k z ( α )null k I ≥ n − C α . (8) Proof:
First, we address part (a). Applying z = D / y to (3) yields min z ⊤ z =1 trace( z ⊤ ( L + α V ) z ) . (9)Since V is symmetric, (9) can be rewritten as min z null ∈ null( V ) , z ⊥ null ∈ null( V ) ⊥ z = z null + z null ⊥ , z ⊤ z = I trace( z ⊤ L z + z ⊤ null ⊥ α V z null ⊥ ) , where z null is a matrix whose columns are contained in null( V ) . The constant C := min z ⊤ null z null = Iz null ∈ null( V ) trace( z ⊤ null L z null ) only depends on the nullspace of V , and it is associatedwith the choice z null ⊥ = 0 . Since y ( α ) is a minimizer of (3), z ( α ) = D / y ( α ) is a minimizer of (9). Obviously, we have k z ( α ) k L + k z ( α ) k α V ≤ C , and according to k z ( α ) k L ≥ ,we obtain k z ( α ) k α V ≤ C . This finally yields k z ( α ) k V ≤ C α . Since k z ( α ) k V = k y ( α ) k V , we have proven (7).In order to verify (8), we will use z ( α ) ⊤ null z ( α )null = I − z ( α ) ⊤ null ⊥ z ( α )null ⊥ . Note that V is symmetric positive semi-definite and there-fore k · k V is a norm on null( V ) ⊥ . Since all norms on afinite dimensional vector space are equivalent, there is aconstant C > such that k · k I ≤ C k · k V on null( V ) ⊥ . Byapplying C := C C , we get the estimate k z ( α )null k I = n − k z ( α )null ⊥ k I ≥ n − C k z ( α )null ⊥ k V = n − C k z ( α ) k V = n − C k y ( α ) k V ≥ n − C α . Let us now state the consequences of Theorem III.1 fora diagonal potential V . Corollary III.2.
Under the assumptions and notation ofTheorem
III.1 , let V = diag( v , . . . , v m ) , then we have,for all i = 1 , . . . , m , v i k y ( α ) i k ≤ m X i =1 v i k y ( α ) i k = k y ( α ) k V ≤ C α . Remark III.3.
Let V = diag(1 , , . . . , , then y ( α )1 ispushed to zero according to Corollary III.2. If x and x were sufficiently close, then we expect that y ( α )2 is alsopushed to zero due to the linkage induced by L .Corollary III.2 means that a diagonal potential pushesentries of y ( α ) to zero. On the other hand, the potentialdoes not force any collapse: Corollary III.4.
Under the assumptions and notation ofTheorem
III.1 , let V = diag( v , . . . , v m ) . a) If v , . . . , v r = 0 and v r +1 , . . . , v m = 0 , then m X i = r +1 D i,i k y ( α ) i k ≥ n − max i =1 ,...,r { D i,i v i } C α . b) If v = D , , . . . , v r = D r,r and v r +1 , . . . , v m = 0 ,then m X i = r +1 v i k y ( α ) i k ≥ n − C α . Proof:
Part (b) is a special case of part (a). To provepart (a), we apply Corollary III.2, which yields m X i = r +1 D i,i k y ( α ) i k = m X i = r +1 k z ( α ) i k = k z ( α )null k I = n − k z ( α )null ⊥ k I = n − r X i =1 D i,i k y ( α ) i k ≥ n − r X i =1 D i,i v i v i k y ( α ) i k ≥ n − max i =1 ,...,r { D i,i v i } r X i =1 v i k y ( α ) i k ≥ n − max i =1 ,...,r { D i,i v i } C α . Corrollary III.2 says that penalized locations are pushedto zero, and Corollary III.4 ensures that not all unlabeleddata are collapsed into zero. Thus, labeling can lead to aseparation between labeled and certain unlabeled data.
D. Schroedinger Eigenmaps: Nondiagonal Potentials
While a diagonal potential V is used to overweight orunderweight certain nodes y i , it turns out that nondiagonal“potentials” are useful as well. For i = j , let V ( i,j ) bethe matrix with entries V ( i,j ) i,i = V ( i,j ) j,j = 1 and V ( i,j ) i,j = V ( i,j ) j,i = − and zeros elsewhere. One then verifies that trace( y ⊤ αV y ) = α k y i − y j k , and Theorem III.1 yields k y ( α ) i − y ( α ) j k ≤ C α . In other words, this kind of potential allows to identifypoints. Adding the matrix V ( i,j ) to the Laplacian has theflavor of modifying the underlying graph G by reweightingits edges. It allows for modifying the weights W i,j , i = j ,and we are able to penalize distances between data pointswhich are not neighbors. We note that a similar changein the graph structure results from warping or weighteddiffusion as proposed e.g., in [17] or [41]. Remark III.5.
If the graph G is connected, then E = L + αV ( i,j ) has a simple eigenvalue equals zero since ∈ null( V ( i,j ) ) . Thus, we should use the constraint y ⊤ D / = 0 for SE with potentials V ( i,j ) . To establish a (a) (left) Original arc in three-dimensionalspace. (right) Laplacian Eigenmaps per-fectly recovers the two-dimensional struc-ture.(b) The Schroedinger Eigenmaps with diagonal potential V =diag(0 , . . . , , , , . . . , only acting in one point y i in the middleof the arc for α = 0 . , . , . , . This point is pushed to zero. Since V is ’discontinuous’, the dimension reduced arc gets more and morediscontinuous until y i will become zero for sufficiently large α . Thenthe arc does not deform any more.(c) By labeling the end points of the arc within the nondiagonal potential V ( i,j ) , for α = 0 . , . , . , , we are able to control the dimensionreduction such that we obtain an almost perfect circle (up to a smalldiscontinuity).Fig. 1. Schroedinger Eigenmaps applied to an arc uniform methodology of Schroedinger Eigenmaps, we usethe constraint y ⊤ D / w = 0 throughout, where w is aneigenvector of the Schroedinger operator E = L + αV ,with respect to the smallest eigenvalue. Thus, as in the caseof Laplacian Eigenmaps, we skip the smallest eigenvectorsand take the n subsequent ones.To demonstrate the use of the barrier potential, SE isapplied to a three-dimensional arc in Figure 1. Remark III.6.
For the purpose of sequentially identifyingpoints, we propose to use the potential V = m − X i =1 V ( i,i +1) , (10)which penalizes L by P m − i =1 k y i − y i +1 k . Note that theeigenvectors of this matrix are the basis vectors of thediscrete cosine transform of type II. E. Schroedinger Eigenmaps For Classification
In this section we discuss the standard Vector AngleClassification (VAC) in combination with SchroedingerEigenmaps. In this regard, suppose we have s pairwisedifferent seed vectors a , . . . , a s ∈ R n , which representthe centers of each class. The class C i consists of y ( α ) i swhose angle is closest to the specific center vector as longas the angle is smaller than some tightness parameter δ i . Weadditionally apply a threshold parameter δ to refuse y ( α ) i sthat have small norm, which by themself form another class. According to Corollary III.2, SE can be applied to“force” processed pixels to zero at the rate / √ α and,hence, below any threshold δ > for sufficiently large α . A practitioner can use labels in the form of a diagonalpotential to separate a particular region of an image, seeSection IV-B. On the other hand, different spatial regionscan be identified with each other by labelling them withnon-diagonal potentials V ( i,j ) , cf., (10). This is useful toclassify pathological areas as one single class despite thefact that intrinsic intra-class variations can be large.As it was mentioned in the introduction, a very consid-erable effort has been aimed at developing partially labeledclassification schemes based on the Laplacian Eigenmapsalgorithm, on related graph random walks, or on DiffusionMaps. The classification scheme based on SchroedingerEigenmaps with partially labeled data is novel. Whencompared to “warping” and function adapted diffusionprocesses [17] [41], Schroedinger Eigenmaps result in amore general class of matrices (operators), by allowingmodifications to the diagonal as well as to the off-diagonalterms. These diagonal terms, in turn, are associated withlocations of the barrier potential which affects and steersthe diffusion process. On the other hand, SchroedingerEigenmaps classification is a joint optimization process forlabeled and unlabeled data, as opposed to classical semi-supervised algorithms [3], [5].IV. B IOMEDICAL A PPLICATIONS
The performance of Schroedinger Eigenmaps on threestandard medical datasets is compared with linear andGaussian kernel Support Vector Machines (SVM). Afterhaving verified the usefulness of Schroedinger Eigenmaps,we apply them to analyze new problems in genetic expres-sion analysis and retinal multispectral imaging.
A. Three standard bio-medical datasets
All standard datasets are obtained from the UCIMachine Learning Repository. Training points are randomlyselected. The potential is designed on the training data andSE are computed on the entire dataset so that VAC withthreshold leads to the final class separation.After removing missing values, the Wisconsin BreastCancer Database (WBCD) [47] contains patterns with attributes and is grouped into two classes, benign andmalignant. The diagonal potential is used on the benignclass in the training data. The processed vectors that fallbelow a threshold form the predicted benign class on theentire dataset.The Cleveland Heart Disease Database (CHDD) [22]contains patterns with attributes and is groupedinto presence of heart disease (values 1,2,3,4) and absence(value 0). We removed patterns due to missing values.Following the focus in the literature, we aim to separateabsence from heart disease. We label absence in the trainingdata using the diagonal potential and identify heart diseaseclasses - by applying the nondiagonal potential. The predicted absence class is formed by the vectors that fallbelow the threshold.The Mammographic Mass Data Set (MMD) [27] con-tains instances and attributes that are grouped intobenign and malignant. Note that we removed patternsdue to missing values and suppressed the first attribute asit was a numerical assessment of the range between benignand malignant of a double-review process by physicians.In our computations of SE, we keep eigenvectors, andthe number of nearest neighbors varies between and ,the weight parameter α in the potential, and the thresholdparameter are optimized for minimizing the error rates.We compare our method with Laplacian Eigenmaps (sameparameters as for SE) and with SVM. The Matlab imple-mentation of SVM is used with a linear and a Gaussiankernel function, and the separating hyperplane is foundfrom least squares or a one-norm with soft-margin svm; thebest result is taken. The parameter in the SVM Gaussiankernel corresponds to σ in Schroedinger Eigenmaps. Afteroptimization of σ in SVM for each dataset separately, itappeared consistent that best results for both, SVM andSE, were obtained by choosing σ = 1 / , , in SE and σ in SVM for the datasets WBCD, CHDD, and MMD,respectively.Error rates for large training data yield information onthe nonlinear data structure. For instance, as the error ofthe linear SVM applied to WBCD with training pointsdoes not vanish, the data cannot be separated in a linearfashion, see Table I.The comparison to Laplacian Eigenmaps (same parame-ters as for Schroedinger Eigenmaps) shows how the poten-tial can improve the classification process. If there are onlyfew training data, then Schroedinger Eigenmaps lead to thesmallest error rates among all methods, most considerablyfor CHDD and MMD. When increasing the number oftraining points, the Gaussian kernel SVM yields the bestresults except for MMD. Typical bio-medical data allowfor only few reliable training data and in this situationSchroedinger Eigenmaps appear most useful. B. Applications to new biomedical data
Up to this point we have compared Schroedinger Eigen-maps with other methods for semi-supervised classification.We believe the strength of our proposed technique is three-fold: good performance with few labeled points, a globalapproach yielding good local discrimination, and ability tocorrectly classify in the presence of non-linear mixing. Inthis section we shall illustrate some of these points with twoexamples: a classification problem in medical imaging, anda clustering problem in genetic analysis. At the same time,these two examples give additional evidence for the resultsof the first part of our paper.
1) Analysis of clinical, retinal images:
First, we ana-lyze a problem in retinal imaging, where our goal is todetect and differentiate between two classes of chemicalmixtures in the human retina. This problem is of interestto the medical community, as it appears in studies of age-related macular degeneration (AMD) – the leading cause dataset,
40 14% 5% 5% 4%
WBCD,
100 9% 4% 4% 3%
WBCD, –
4% 3% 3%
WBCD, –
4% 2% 3%
WBCD, –
4% 2% 3%
CHDD,
40 42% 21% 19% 15%
CHDD, –
17% 15% 12%
CHDD, –
16% 11% 12%
CHDD, –
15% 9% 11%
MMD,
20 45% 24% 24% 22%
MMD,
30 40% 22% 22% 21%
MMD,
40 38% 22% 21% 20%
MMD, –
21% 20% 20%
MMD, –
20% 19% 20%
MMD, –
20% 18% 20%
TABLE IE
RROR RATES AS INCORRECTLY CLASSIFIED SAMPLES / TOTAL DATAARE AVERAGED OVER
INSTANCES FOR EACH METHOD . LE
ISCOMBINED WITH
VAC,
WHERE TRAINING DATA ARE USED AS SEEDSTHAT ALWAYS END UP IN THE CORRECT CLASS . T
HEREFORE , THE LE OUTCOME FOR LARGE TRAINING DATA IS NOT DUE TO THE LE METHOD of blindness among the elderly population in the developedworld, see [14], [13], [19], [36]. The balance between thesetwo chemical mixtures, which we want to detect, can beused to determine the stage of the disease and, in turn, theproper course of action [31], [32], [45]. There is demandin the medical community for automated analysis tools thatenable detection and classification while, at the same time,allowing for expert input [24], [34], [38].We shall apply our Schroedinger Eigenmaps technique,with the locations of the potential chosen by medicalexperts. This example further illustrates that relatively few(less than 0.1%) labels may lead to good classificationresults. An added difficulty in this scenario is that, due tolimitations of the imaging device, the spectral informationabout the chemicals of interest is always nonlinearly mixedwith various other responses, e.g., from layers just beneaththe retina, or from different illumination patterns. As such,most of the mixing models that rely on presence of purepixels in the image fail to work.For each patient, 4 excitation with 2 emission filters andtrifold imaging lead to images ( × pixels) thatare aligned by applying the commercial software i2k Align.One pixel, therefore, has entries in z -direction, so thatthe acquired dataset are vectors in -dimensionalspace. The earliest clinical signs of AMD are bright spotsin retinal reflection images, see Fig. 2(a), referred to as drusen [9]. We have applied Schroedinger Eigenmaps withdiagonal potential acting on a small predefined drusen area( pixels) that was identified by medical experts. Thenumber of nearest neighbors was chosen to be , σ = 1 , α in the potential is times the average norm of the datavectors, and we used eigenvectors. We obtain a drusenmap of all pixel vectors whose Schroedinger Eigenmapscoordinates fall below a threshold that was adjusted throughvisual inspection, cf. Fig. 2(b).It was suggested in [21], that drusen centers may separatefrom outer drusen areas. To distinguish the two drusen (a) drusen appear as brightspots; one is marked by thegreen square; detection and fur-ther subclassification is needed (b) only a small drusen region( pixels) in the upper left cor-ner was penalized in the barrierpotential providing one class ofdrusen(c) blue area ( pixels)was additionally identifiedby SE (non-diag. matrix);VAC with threshold yields well-separated drusen classes(black/red). Consistent with[21], we separate drusencenters from outer drusensegments (d) zoomed onto the markeddrusen in (a); red lines show LRWsegmentation; drusen center andouter drusen area were separatedby using sets of labels; drusencenter (blue), drusen outer area(yellow), background (green)(e) The class identified in (b) isfurther subclassified by runninglinear SVM on this subset withlabels on the drusen center (f) Gaussian SVM with labeleddata as in (e) applied to theclass in (b)Fig. 2. Classification with Schroedinger Eigenmaps, LRW verification, andcomparisons to SVM based on multi-spectral retinal image sets subclasses, we additionally form a second potential on adrusen center ( pixels) that identifies pixels and run SEsimultaneously with both potentials activated. We obtaintwo subclasses of drusen, one by pixels whose SchroedingerEigenmaps coordinates fall below the threshold, the otherone is derived from taking the average over identified pixelvectors as a seed in VAC. Threshold and class tightnessparameter are optimized through visual inspection, seeFig. 2(c). The same label locations are used for linear andGaussian SVM that are applied to the detected drusen inFig. 2(b) for further subclassification, see Fig. 2(e,f).In a single drusen, members of the same class arespatially connected, so that we can apply the segmentation tool Laplacian Random Walk (LRW) in [30], confirmingthe drusen subclassification, cf. Fig. 2(d). SE appears well-suited to incorporate spatially unconnected regions, goingbeyond image segmentation.
2) Analysis of gene expressions:
Recent efforts in biol-ogy focus on the understanding of genetic datasets basedon complex gene interactions, rather than analyzing genesindividually. This shift of perspectives increases the impor-tance of small local variations, which may be irrelevantfrom a purely global perspective. Therefore, analysis andclassification of time series gene expressions require thebalance between local and global variations.Hierarchical clustering in [12] and Laplacian Eigenmapsin [25] was used to tentatively analyze a collection of microarray gene expressions at time-points related to eyemorphology in mice, see [10], [48] for further background.The goal of those studies was to identify new gene clusterswithin this dataset that are associated to eye development.We used a barrier potential in [26] to label certain DNAspecific proteins (transcription factors) and found a setof new genes related to eye morphogenesis, which werenot discovered by traditional hierachichal techniques. Thus,even a preliminary application of the barrier potential fordata labeling proved useful in [26] to add complementaryexpert data enabling improvements in gene clustering.V. C ONCLUSIONS
We have introduced a generalization of Laplacian Eigen-maps to allow for expert data in form of a potentialon a data-dependent graph. The properties of our novelSchroedinger Eigenmaps are studied, and we illustratetheir action on both, artificial and standard medical data.At this stage these are only illustrations, and providingmathematical formalism behind these results is part of ouron-going and future work on this subject. It appears that ourmethod is robust and particularly useful if the data quality islow and there are only few training data, a typical situationin bio-medical applications. In addition, our scheme issuccessfully applied to the analysis of retinal multispectralimages. We classify drusen based on few training pixelsthat were labeled by a physician. Our results suggest thatSchroedinger Eigenmaps are useful in high-dimensionalbio-medical data analysis when limited expert knowledgeis available. A
CKNOWLEDGMENT
This work was supported by the Intramural ResearchProgram of NICHD/NIH, by NSF (CBET0854233), byNGA (HM15820810009), by ONR (N000140910144), andby NIH/DFG (EH 405/1-1/575910). The authors also ex-press their gratitude to Drs. R. F. Bonner, B. Brooks,E. Y. Chew, and S. M. Meyers for their assistance withacquiring and analyzing the included datasets. We alsoacknowledge V. N. Rajapakse and the anonymous refereesfor their valuable comments and suggestions that led to theimprovement of this paper. R EFERENCES[1] M. Aubry, U. Schlickewei, and D. Cremers, “The wave kernelsignature: A quantum mechanical approach to shape analysis,” in
IEEE International Conference on Computer Vision , 2011.[2] C. M. Bachmann, T. L. Ainsworth, and R. A. Fusina, “Ex-ploiting manifold geometry in hyperspectral imagery,”
IEEETrans. Geosci. Remote Sens. , vol. 43, no. 3, pp. 441–454, 2005.[3] M. Belkin and P. Niyogi, “Using manifold structure for partiallylabeled classification,” in
NIPS , 2002.[4] ——, “Laplacian eigenmaps for dimensionality reduction and datarepresentation,”
Neural. Comput. , vol. 15, no. 6, pp. 1373–1396,2003.[5] ——, “Semi-supervised learning on riemannian manifolds,”
MachineLearning , vol. 56, pp. 209–239, 2004.[6] ——, “Towards a theoretical foundation for laplacian-based manifoldmethods,”
J. Comput. System Sci. , vol. 74, no. 8, pp. 1289–1308,2008.[7] M. Belkin, P. Niyogi, and V. Sindhwani, “Manifold regularization:A geometric framework for learning from labeled and unlabeledexamples,”
Journal of Machine Learning Research , vol. 7, pp. 2399–2434, Nov. 2006.[8] J. J. Benedetto, W. Czaja, J. C. Flake, and M. Hirn, “Frame basedkernel methods for automatic classification in hyperspectral data,” in
IEEE IGARSS , Cape Town, South Africa, 2009.[9] A. C. Bird and al., “An international classification and gradingsystem for age-related maculopathy and age-related macular de-generation. The international ARM epidemiological study group.”
Surv. Ophthalmol. , vol. 39, no. 5, pp. 367–374, Mar-Apr 1995.[10] R. F. Bonner and al., “Laser capture microdissection: molecularanalysis of tissue,”
Science , vol. 21, no. 278, pp. 1481–1483, 1997.[11] A. M. Bronstein, “Spectral descriptors for deformable shapes,” arXiv:1110.5015v1 , 2011.[12] J. D. Brown and al., “Expression profiling during ocular developmentidentifies 2 nlz genes with a critical role in optic fissure closure,”
Proc. Nat. Acad. Sci. USA , vol. 106, no. 5, pp. 1462–7, Feb 2009.[13] E. Y. Chew and al., “Risk of advanced age-related macular degen-eration after cataract surgery in the age-related eye disease study:Areds report 25,”
Ophthalmology , vol. 116, no. 2, pp. 297–303, Feb2009.[14] E. Y. Chew, A. S. Lindblad, T. Clemons, and Age-Related Eye Dis-ease Study Research Group, “Summary results and recommendationsfrom the age-related eye disease study,”
Arch. Ophthalmol. , vol. 127,no. 12, pp. 1678–9, Dec 2009.[15] F. R. K. Chung, “Spectral Graph Theory,”
CBMS Regional Confer-ence Series in Mathematics , vol. 92, 1997.[16] R. R. Coifman and S. Lafon, “Geometric harmonics: A noveltool for multiscale out-of-sample extension of empirical functions,”
Appl. Comput. Harmon. Anal. , vol. 21, no. 1, pp. 31–52, 2006.[17] R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler,F. J. Warner, and S. W. Zucker, “Geometric diffusions as a tool forharmonic analysis and structure definition of data. part i: Diffusionmaps,”
Proc. Nat. Acad. Sci. , vol. 102, pp. 7426–7431, 2005.[18] R. R. Coifman and M. Maggioni, “Diffusion wavelets,”
Appl. Com-put. Harmon. Anal. , vol. 21, no. 1, pp. 53–94, 2006.[19] H. R. Coleman, C. Chan, F. L. Ferris, and E. Y. Chew, “Age-relatedmacular degeneration,”
Lancet , vol. 372, no. 9652, pp. 1835–45, Nov2008.[20] W. Czaja and A. Halevy, “On convergence of schroedinger eigen-maps,” preprint , 2011.[21] F. C. Delori and al., “Autofluorescence distribution associatedwith drusen in age-related macular degeneration,”
Invest. Ophthal-mol. Vis. Sci. , vol. 41, no. 2, pp. 496–504, 2000.[22] R. Detrano and et al., “International application of a new probabilityalgorithm for the diagnosis of coronary artery disease,”
AmericanJournal of Cardiology , vol. 64, pp. 304–310, 1989.[23] D. L. Donoho and C. Grimes, “Hessian eigenmaps: new lo-cally linear embedding techniques for high-dimensional data,”
Proc. Nat. Acad. Sci. , vol. 100, pp. 5591–5596, 2003.[24] M. Ehler, J. Dobrosotskaya, and al., “Modeling photo-bleachingkinetics to map local variations in rod rhodopsin density,”
SPIEMedical Imaging, Computer-Aided Diagnosis , vol. 7963, 2011.[25] M. Ehler, V. N. Rajapakse, B. Zeeberg, B. Brooks, J. Brown,W. Czaja, and R. F. Bonner, “Analysis of temporal-spatial co-variation within gene expression microarray data in an organogenesismodel,” in , ser. Lecture Notes in Bioinformatics.Springer Verlag, 2010.[26] ——, “Nonlinear gene cluster analysis with labeling for microarraygene expression data in organ development,”
BMC Proceedings , vol.5(Suppl 2), p. S3, 2011.[27] M. Elter, R. Schulz-Wendtland, and T. Wittenberg, “The predictionof breast cancer biopsy outcomes using two CAD approaches thatboth emphasize an intelligible decision process,”
Medical Physics ,vol. 34, no. 11, pp. 4164–4172, 2007.[28] S. Gerber, T. Tasdizen, and R. Whitaker, “Robust non-linear di-mensionality reduction using successive 1-dimensional laplacianeigenmaps,” in
Proceedings of the 24th international conference onMachine learning , 2007, pp. 281–288.[29] Y. Goldberg, A. Zakai, D. Kushnir, and Y. Ritov, “Manifold learning:The price of normalization,”
Journal of Machine Learning Research ,vol. 9, pp. 1909–1939, 2008.[30] L. Grady, “Random walks for image segmentation,”
IEEE Trans. Pat-tern Anal. Mach. Intell. , vol. 28, no. 11, pp. 1768–1783, Nov. 2006.[31] G. S. Hageman and al., “An integrated hypothesis that considersdrusen as biomarkers of immune-mediated processes at the RPE-bruch’s membrane interface in aging and age-related macular degen-eration,”
Prog. Retin. Eye Res. , vol. 20, no. 6, pp. 705–732, 2001.[32] R. Haimovici and al., “The lipid composition of drusen, bruch’smembrane, and sclera by hot stage polarizing light microscopy,”
Invest. Ophthalmol. Vis. Sci. , vol. 42, no. 7, pp. 1592–1599, 2001.[33] A. Halevy, “Extensions of laplacian eigenmaps for manifold learn-ing,”
Ph.D. Thesis, University of Maryland , 2011.[34] J. Kainerstorfer, M. Ehler, and al., “Principal component model ofmulti spectral data for near real-time skin chromophore mapping,”
J. Biomed. Opt. , vol. 15, 2010.[35] M. Kim and V. Pavlovic, “Covariance operator based dimensionalityreduction with extension to semi-supervised settings,” in
Proceedingsof the 12th International Conference on Articial Intelligence andStatistics , vol. 5, 2009.[36] S. M. Meyers, M. A. Ostrovsky, and R. F. Bonner, “A model ofspectral filtering to reduce photochemical damage in age-relatedmacular degeneration,”
Trans. Am. Ophthalmol. Soc. , vol. 102, pp.83–93; discussion 93–5, 2004.[37] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reductionby locally linear embedding,”
Science , vol. 290, no. 5500, pp. 2323–2326, 2000.[38] Z. B. Sbeh and L. D. Cohen, “A new approach of geodesic re-construction for drusen segmentation in eye fundus images,”
IEEETrans. on Medical Imaging , vol. 20, no. 12, pp. 1321–1333, 2001.[39] B. Sch¨olkopf and A. Smola,
Learning with Kernels . Cambridge,MA: MIT Press, 2002.[40] A. Sundaresan and R. Chellappa, “Model-driven segmentation ofarticulating humans in laplacian eigenspace,”
IEEE Trans. PatternAnal. Mach. Intell. , vol. 30, no. 10, pp. 1771–85, Oct 2008.[41] A. D. Szlam, M. Maggioni, and R. R. Coifman, “Regularizationon graphs with function-adapted diffusion processes,”
Journal ofMachine Learning Research , vol. 9, pp. 1711–1739, 2008.[42] J. B. Tenenbaum, V. de Silva, and J. C. Langford, “A global geo-metric framework for nonlinear dimensionality reduction,”
Science ,vol. 290, no. 5500, pp. 2319–23, Dec 2000.[43] M. W. Trosset and M. Tang, “On combinatorial laplacian eigen-maps,”
Technical Report, Department of Statistics, Indiana Univer-sity , 2010.[44] U. von Luxburg, M. Belkin, and O. Bousquet, “Consistency ofspectral clustering,”
Ann. Stat. , vol. 36, no. 2, pp. 555–586, 2008.[45] L. Wang and al., “Abundant lipid and protein components of drusen,”
PLoS One , vol. 5, no. 4, p. e10329, 2010.[46] R. C. Wilson, E. R. Hancock, and B. Luo, “Pattern vectors fromalgebraic graph theory,”
IEEE Trans. Pattern Anal. Mach. Intell. ,vol. 27, no. 7, pp. 1112–24, Jul 2005.[47] W. H. Wolberg and O. L. Mangasarian, “Multisurface method ofpattern separation applied to breast cytology diagnosis,”
Proceedingsof the National Academy of Sciences , vol. 87, pp. 9193–9196, 1990.[48] B. Zeeberg and al., “GoMiner: A resource for biological interpreta-tion of genomic and proteomic data,”
Genome Biology , vol. 4, no. 4,2003.[49] X. Zhu, Z. Ghahramani, and J. Lafferty, “Semi-supervised learningusing gaussian fields and harmonic functions,”