TTruecluster matching
Truecluster matching
Jens Oehlschl¨agel [email protected]
Editor:
Abstract
Cluster matching by permuting cluster labels is important in many clustering contexts suchas cluster validation and cluster ensemble techniques. The classic approach is to minimizethe euclidean distance between two cluster solutions which induces inappropriate stabilityin certain settings. Therefore, we present the truematch algorithm that introduces twoimprovements best explained in the crisp case. First, instead of maximizing the trace ofthe cluster crosstable, we propose to maximize a χ -transformation of this crosstable. Thus,the trace will not be dominated by the cells with the largest counts but by the cells withthe most non-random observations, taking into account the marginals. Second, we suggesta probabilistic component in order to break ties and to make the matching algorithm trulyrandom on random data. The truematch algorithm is designed as a building block of thetruecluster framework and scales in polynomial time. First simulation results confirm thatthe truematch algorithm gives more consistent truecluster results for unequal cluster sizes.Free R software is available. Keywords:
Hungarian method, truematch, truecluster, MMCC, CIC, Hornik (2005)
1. Introduction
Applying a cluster algorithm to a dataset results in—fuzzy or crisp—assignments of casesto anonymous clusters. In order to interpret these clusters, we often wish to comparethese clusters to other classifications, so some heuristic is needed to match one classificationto another. With the advent of resampling and ensemble methods in clustering (Gordonand Vichi, 2001; Dimitriadou et al., 2002; Strehl and Ghosh, 2002), the task of matchingcluster solutions has become even more important: we need reliable and scalable matchingalgorithms that do the task fully automated.Consider, for example, the use of bootstrapping or cross-validation for cluster validationas suggested by many authors (Moreau and Jain, 1987; Jain and Moreau, 1988; Tibshiraniet al., 2001; Roth et al., 2002; Ben-Hur et al., 2002; Dudoit and Fridlyand, 2002): manycluster solutions are created and agreement between them is evaluated. Some agreementindices do not need explicit cluster matching (Rand, 1971; Hubert and Arabie, 1985), butothers can only be applied after cluster solutions have been matched, for example, Cohen’skappa (1960).Recently, authors have suggested transfering the idea of bagging (Breiman, 1996) toclustering. Some approaches aggregate cluster centers (Leisch, 1999; Dolnicar and Leisch,2000; Bakker and Heskes, 2001) or aggregate consensus between pairs of observations (Montiet al., 2003; Dudoit and Fridlyand, 2003,
BagClust2 algorithm). Other approaches aggre-gate cluster assignments and, therefore, require cluster matching, for example, the crisp a r X i v : . [ c s . A I] M a y ehlschl¨agel BagClust1 algorithm of Dudoit and Fridlyand (2003), the combination scheme for fuzzyclustering of Dimitriadou et al. (2002) or truecluster (Oehlschl¨agel, 2007b).Truecluster is an algorithmic framework for robust scalable clustering with model selec-tion that combines the idea of bagging with information theoretical model selection alongthe lines of
AIC (Akaike, 1973, 1974) and
BIC (Schwarz, 1978). In order to calculateits cluster information criterion ( CIC ), truecluster requires a reliable cluster matching al-gorithm. The truematch algorithm presented here was designed to play that role. Theorganization of the paper is as follows: in Section 2, we show an undesirable feature of thestandard approach to cluster matching. In Section 3, we present the truematch algorithm.In Section 4, we demonstrate the benefits of the truematch algorithm within the trueclusterframework. In Section 5, we use simulation to compare truematch against standard tracemaximization matching and in Section 6, we discuss our results.
2. What’s wrong with trace maximization of the matching table
The standard aproach to cluster matching is searching for that permutation of cluster labelsthat minimizes the euclidean distance to a reference cluster solution. This criterion has beensuggested for fuzzy consensus clustering (Gordon and Vichi, 2001; Dimitriadou et al., 2002),as well as for crisp consensus clustering (Strehl and Ghosh, 2002) or crisp cluster baggingDudoit and Fridlyand (2003, BagClust1). In the crisp case, this criterion is simply tracemaximization of matching table counts: cross-tabulating class memberships of two solutionsand then permuting rows/columns of the matching table until the trace becomes maximal.To our knowledge, cluster publications and software differ in the algorithms used to obtaintrace maximization, but do not question the euclidean criterion per se.For example, Dimitriadou et al. (2002) suggested a recursive heuristic to approximatetrace maximization. It is known that trying all permutations has time complexity O ( K !),where K denotes the number of clusters. The Hungarian method improves on this andachieves polynomial time complexity O ( K ). Kuhn (1955) published a pencil and pa-per version, which was followed by J.R. Munkres’ executable version (Munkres, 1957) andextended to non-square matrices by Bourgeois and Lassalle (1971). For a list of further al-gorithmic approaches to this so-called linear sum assignment problem or weighted bipartitematching , see Hornik (2005).However, scalablility is not the only quality aspect of a matching algorithm. An impor-tant statistical feature of a matching algorithm is the following: if we match two randompartitions, the matching algorithm should not systematically align the two partitions. Wenow show that the classic trace maximization does not generally possess this feature.Assume a cluster algorithm that claims to identify an outlier in a sample of size N = 100but which actually declares one case as ‘outlying’ by random. Now assume a procedure thatdraws two bootstrap samples and clusters them into 99% ‘normal’ cases and one ‘outlier’.In 1% of such procedures, the outlier picked in the second sample will randomly match theoutlier picked in the first sample. In such cases, trace maximization matching will lead toa matching table as shown in Table 1. In the other 99%, there will be no match, which—by trace maximization—gives a matching table like that shown in Table 2. The resulting expected matching table is shown in Table 3. ruecluster matching a ba 99 0b 0 1Table 1: Random matching (1%)a ba 98 1b 1 0Table 2: Typical trace maximization matching (99%)a ba 98.01% 0.99%b 0.99% 0.01%Table 3: Expected trace maximization matchingWe can see that under random clustering, we expect 98.02% on the main diagonal whichat first glance looks like a strong (non-random) match. Only applying standard randomcorrection (Cohen, 1960) confirms this to be a pure random match (Cohen’s kappa = 0).However, in a clustering context we have two objections against relying on such randomcorrections: as far as evaluation of cluster agreement is concerned, random corrections,such as Cohen’s kappa or Hubert and Arabie’s corrected rand index do not work properly,because spatial neighbors have an above-random chance of being clustered together in theabsence of any cluster structure in the data. Therefore, agreement indices are too optimisticeven with random correction. More importantly, in other contexts such as bagging there isno random correction available at all. If cluster sizes are (very) different, bagging clusterresults will suffer because in standard trace maximization big randomly matched cells winover small cells representing non-random matches. Therefore, we are looking for a matchingalgorithm that does not systematically generate a strong diagonal under random conditions.
3. Truematch algorithm
The problems with standard trace maximization described in the previous section resultfrom focusing on raw counts in a situation with unequal marginal (cluster) probabilities.From other contexts, we know that this is not a good idea. Take the χ -test for statisticalindependence of two categorial variables. It is not based on raw counts. Instead, thematching table of raw counts is transformed to another unit taking the marginals intoaccount. Let N denote the total number of observations, n k the number of observations inone row, n l the number of observations in one column and, finally, let n k,l denote the number ehlschl¨agel of observations in one cell of the K x K cluster crosstable. The first step in calculating χ is to calculate for each cell the number of expected counts ˆ n k,l under the assumption ofindependence: ˆ n k,l = p k · p l · N = n k · n l N (1)Then, we transform the matrix of raw counts in Equation 1 into a matrix of normalizedsquared deviations d k,l from the null model: d k,l = ( n k,l − ˆ n k,l ) ˆ n k,l (2)The χ -value is defined as the sum of Equation 2 over all cells. If we restore the sign inEquation 2, we get: s k,l = sign ( n k,l − ˆ n k,l ) · d k,l (3)In order to cope with unequal cluster sizes, we suggest basing cluster matching onmaximizing the trace of s k,l rather than on maximizing the trace of n k,l . And in orderto avoid any systematic not based on the data, we add a probabilistic component to thematching algorithm. Consequently we define the truematch algorithm as:1. Randomly permute rows and columns of the matching table2. Transform the matching table counts n k,l to signed normalized squared deviations s k,l using Equation 33. Apply a trace maximization algorithm like the Hungarian method to maximize thetrace (in fact the Hungarian method minimizes − s k,l )4. Order the resulting row/column pairs descending by s k,l breaking ties at randomIf no trace maximization algorithm like the Hungarian method is available, the match-ing can easily be done using the truematch heuristic similar to the heuristic suggested byDimitriadou et al. (2002):1. Calculate signed normalized squared deviations s k,l for all remaining cells of thematching table2. Order all cells descending by s k,l and by n k,l (breaking ties by random) and denotethe first cell as the target cell
3. Match the row of the target cell to the column of the target cell4. Remove the row and the column of the target cell from the matching table5. If both the number of remaining rows and columns is at least two, repeat from step 1 ruecluster matching It is obvious that the truematch algorithm has runtime complexity O ( K ) like theHungarian method. The truematch heuristic also nicely translates into polynomial runtime.The number of residuals calculated to reduce the matching table from k to k − K , thusthe total number of residuals calculated is K + ( K − + ( K − + ... + 2 = ( K · ( K + 1)) · (2 K + 1)6 − O ( K ) and memory com-plexity O ( K ) if the recursive nature of the algorithm is realized using a while-loop.R package truecluster (Oehlschl¨agel, 2007a) implements the truematch algorithm in matchindex(method = "truematch") and the truematch heuristic in matchindex(method= "tracemax") efficiently through underlying C-code.Applying the truematch algorithm and the truematch heuristic to the above examplegives identical results: as in standard trace maximization matching, we find 1% randommatches in matching table 1, but for the 99% non-random matching cases, truematch gen-erates two versions of matching tables, see Table 4. Both versions have shifted the majorityof counts off-diagonal. Due to the probabilistic component in the 2nd step, this leads tothe expected matching (Table 5) that has a weak trace. Under truematch, only systematic,non-random matches will result in a strong diagonal.a ba 1 98b 0 1 a ba 1 0b 98 1Table 4: Typical truematch (49.5% + 49.5%)a ba 1.98% 48.51%b 48.51% 1.00%Table 5: Expected truematchWe can quantify the benefit of truematch in this case by comparing expected valuesof certain agreement indices, cf. Table 6. The rand index (Rand, 1971) and its randomcorrected version crand (Hubert and Arabie, 1985) are invariant against row/column per-mutations and, thus, do not differ. There is also no difference for kappa (Cohen, 1960).However, the big difference is on the simple non-random-corrected diagonal fraction of ob-servations: while the trace maximization misleadingly results in an expected diagonal closeto 1, truematch reduces the expectation of this non-random-corrected index close to zero.In the next two sections, we will explore the benefit of truematch in a bagging context,where the main diagonal defines the matching but no random correction is available. ehlschl¨agel fraction diagonal kappa rand crandTracemax RandomMatch 1.0% 1.00 1.00 1.000 1.00Tracemax NonRandomMatch 99.0% 0.98 -0.01 0.960 -0.01Tracemax Expected 100.0%
4. The role of truematch in truecluster
The truecluster concept (Oehlschl¨agel, 2007b) suggests a cluster information criterion ( CIC )that evaluates for each cluster model (for each number of clusters) a N x K matrix ˆ P thataggregates votes over many resamples. ˆ P is created by the multiple match cluster count ( M M CC ) algorithm using the truematch algorithm as follows:1. Create a N x K matrix C and initialize each cell C i,k with zero2. Take a resample (with replacement) of size N , use a base cluster algorithm to fitthe K -cluster model c ∗ to the resample. Then, use a suitable prediction method todetermine cluster membership of the out-of-resample cases to get a complete clustervector c (cid:48) with N elements c (cid:48) i
3. For each row in C add one vote (add 1) to the column corresponding to the clustermembership in c (cid:48)
4. Repeat step 25. Estimate cluster memberships ˆc by row-wise majority count in C (breaking ties atrandom), use the truematch algorithm or heuristic to align c (cid:48) with ˆ c , and rename theclusters in c (cid:48) like the corresponding clusters in ˆ c
6. For each row in C add one vote (add 1) to the column corresponding to the clustermembership in c (cid:48)
7. Repeat from step 4 until some reasonable convergence criterion is reached8. Divide each cell in C by its rowsum to get a matrix of estimated cluster membershipprobabilities ˆ P Table 7 summarizes simulations with truecluster versus consensus clustering: 100 cases,10,000 replications, for details see
MMCCconcensus.r in R package truecluster (Oehlschl¨agel,2007a), the table is sorted and grouped by the magnitude of CIC values). For random datawithout cluster structure, we would expect very ‘fuzzy’ ˆ P without clear preferences for any ruecluster matching cluster. Furthermore, we would expect CIC to increase for models with more true clustersand to decrease if models try to distinguish more clusters than justified by the data.Table 7 shows that the MMCC algorithm using truematch delivers on this expectation:CIC increases for justified clusters and declines for unjustified ones, even if unjustifiedclusters in the model are small. This works because once cluster decisions are unjustified, thetrumatch algorithm starts distributing its votes randomly across undistinguishable columnsof C and, thus, ‘fuzzifies’ ˆ P . Compare that to consensus clustering (Dimitriadou et al.,2002) based on trace maximization obtained with R package clue (Hornik and Boehm,2007; Hornik, 2005). Models with unjustified small clusters get CIC values as high asmodels without the unjustified cluster. This is a consequence of the trace maximizationmatching, adding inappropriate stability to the voting. Take, for example, the ”random99:1” model, which is as unjustified as the ”random 50:50” model but receives a muchhigher CIC value. The stability induced by the trace maximization matching results inquite a crisp ˆ P : for each row, we find high probability for one cluster and low probabilityfor the other. If we assign cases to clusters based on the maximum probability per rowin ˆ P , all cases are assigned to the same cluster. Such a degenerated ˆ P is not wrong butunfortunate. If we manually analyze ˆ P , we might detect that ˆ P actually represents a one-cluster (K=1) model. But if we are after automatic selection of models (number of clusters),it is misleading that ˆ P does not represent K = 2 but K = 1. Analyzing a consensus clustersolution ˆ P K for degeneracies does not really help: the estimated probabilitites can be biasedeven before the matrix formally degenerates. ehlschl¨agel MMCC true K model K H RMC I CICrandom 50:49:1 1 3 1.578 0.020 0.044 -1.534random 99:1 1 2 1.000 0.010 0.014 -0.985random 50:50 1 2 0.995 0.010 0.059 -0.936single 100 1 1 0.000 0.000 0.000 0.000justified 50 random 49:1 2 3 0.499 0.018 0.695 0.196justified 50:50 2 2 0.000 0.010 0.990 0.990consensus true K model K H RMC I CICrandom 50:49:1 1 3 1.066 0.011 0.049 -1.016random 50:50 1 2 0.995 0.010 0.048 -0.947random 99:1 1 2 0.081 0.001 0.001 -0.080single 100 1 1 0.000 0.000 0.000 0.000justified 50 random 49:1 2 3 0.071 0.011 0.965 0.895justified 50:50 2 2 0.000 0.010 0.990 0.990true K true number of clustersmodel K model number of clustersH model uncertaintyRMC relative model complexityI model informationCIC cluster information criterion (I-H)single 100 theoretical values for single group (no cluster)random 50:50 random clustering with 2 equal sized clustersrandom 99:1 random clustering 2 unequal sized clustersrandom 50:49:1 random clustering with 3 unequal sized clustersjustified 50:50 justified clustering with 2 equal sized clusterjustified 50 random 49:1 2 justified clusters, one randomly split unequal sizedTable 7: consensus cluster vs. truecluster ruecluster matching
5. Simulation results
In order to systematically investigate the consequences of the different features of truematchversus simple trace maximization matching, we have carried out extensive simulations withinthe truecluster framework: we assume two clusters and vary their relative size p and thereliability κ of a fictitious clustering algorithm and compare the truecluster results gainedvia trace maximization versus truematch. We did two versions of the simulations: in the non-fixed version, p just determines sampling probabilitites; in the fixed version, the ficti-tious clustering algorithm enforces the exact relative size p of the two clusters. Details ofthe simulation are given in Appendix A.Figure 1 shows information , uncertainty , and its difference CIC for the non-fixed sim-ulations. White areas denote simulation trials where the truecluster algorithm degeneratedfrom a 2-cluster solution to a 1-cluster solution. The most notable difference is the bigshare of non-converged truecluster solutions using trace maximization, compared to thetruematch algorithm. The estimated information, given reliability and skewness, is verysimilar and reasonable: information is highest for p = 0 . κ = 1 . κ and/or skewing p .By contrast, compared for uncertainty and for the CIC , trace maximization and true-match differ dramatically. Using trace maximization, the uncertainty estimate does notonly depend on κ but is also artificially lower for higher skewness. As a consequence, clus-ter models with unequal cluster sizes get better CIC values than cluster models with equalcluster sizes. Using the truematch algorithm almost avoids this undesirable pattern: theestimated uncertainty almost only depends on κ , not on p . The estimated CIC shows avery reasonable pattern: at high κ the CIC is highest for equal sized clusters—conformingwith the entropy principle— at low κ , the CIC is low, however skewed p is. Only at veryextreme p is the CIC biased downwards: too small clusters cannot be detected with toosmall a sample size. Extreme models are non-identifiable and the uncertainty estimate hashigh variance. Keep in mind that ‘extreme’ p corresponds to very few cases at a samplesize of N = 100. The fixed simulations gave similar results (Figure 2).In summary, trace maximization fails to estimate uncertainty independent of skewnessand tends to overestimate CIC for unequal cluster sizes or fails to converge. This restrictsits usefulness for cluster evaluation and bagging. By contrast, the truematch algorithmworks at almost any combination of reliability and skewness (with the exception of non-identifiable models, given the sample size). ehlschl¨agel Figure 1: Results of non-fixed simulations ruecluster matching Figure 2: Results of fixed simulations ehlschl¨agel
6. Discussion
We have shown that trace maximization matching fails to behave sufficiently neutrally whenmatching clusterings. The problem arises generally but is especially important in contextswhere random correction is not applicable. As an alternative, we have presented the true-match algorithm and heuristic, both probabilistically generate neutral expected matchingtables and scale in polynomial time. Our simulations have confirmed that truematch avoidsunjustified (expected) matchings induced by unequal cluster sizes. For the simulations donehere, the truematch algorithm and the truematch heuristic behave identically. Since thetruematch heuristic does not guarantee maximizing the χ -criterion, we expect the true-match algorithm to be superior. However, there is a subtle difference: while the matchingof the truematch algorithm depends solely on s k,l , the truematch heuristic uses s k,l and n k,l to select the row/column matches. Therefore, a final decision about an optimal matchingalgorithm needs more investigation.Truematch is central to the M M CC algorithm, which creates the basis for the CIC-evaluation in the truecluster framework and, thus, contributes to solving the decade-oldproblem of choosing the optimal number of clusters. Beyond that, cluster bagging, ingeneral, could benefit from using truematch: the resulting N x K matrix is rather fuzzifiedthan degenerated for unjustified cluster splits. This allows for better automated processingof such results. It is an open question whether the truematch algorithm also has advantagesfor consensus clustering, or whether different usages of cluster ensembles require differentmatching algorithms. Acknowledgments
We would like to thank Dr. Stefan Pilz for reviewing this paper and giving valuable hintsfor improvement. ruecluster matching Appendix A.
In this appendix, we give details concerning the simulations in section 5: assume a vector x of length 100 with ‘true’ sample group memberships where p denotes the fraction of 1 and(1 − p ) fraction of 0. Let p denote the matrix of joint probabilities for a case’s true andclustered classification when the cluster algorithm perfectly separates 0 from 1 (at κ = 1). p = (1 − p ) 00 p Let p denote the matrix of joint probabilities for a case’s true and clustered classificationwhen the cluster algorithm makes a random guess when separating 0 from 1 (at κ = 0). p = (1 − p ) (1 − p ) · p (1 − p ) · p p Then p κ denotes the matrix of joint probabilities for a case’s true and clustered classi-fication when the cluster algorithm has reliability κ . p κ = κ · p + (1 − κ ) · p The two conditional probabilites p id that the clustering algorithm identifies the trueclass, given the true class, are p id = κ + (1 − κ ) · (1 − p ) p For each value of p ∈ { / , / .. / } and each value of κ ∈ { . , . , . , .., . } ,we simulate aggregation of 1000 bootstrap samples from x , for each bootstrap sample ourfictitious cluster algorithm assigns cases with probability p id to the true class and withprobability 1 − p id to the other class. The resulting cluster memberships c ∗ are matchedversus the (current) estimated cluster memberships ˆc of the cases in the bootstrap sample.If c ∗ or ˆc does not contain two classes, the bootstrap sample is dropped and replaced byanother one. Differently from the M M CC algorithm in Section 4, we do not predict clustermemberships of the out-of-bag cases. We use c ∗ directly instead of c (cid:48) , consequently the rowsof C are not guaranteed to have aggregated an equal number of votes. For all combinationsof p and κ —the resulting 99x101 truecluster models ˆ P —we calculate information , uncer-tainty , and CIC (Oehlschl¨agel, 2007b). These values are visualized using colorcoding andcontourlines are added based on a loess smooth. To create the f ixed version, the completeprocedure is repeated, additionally enforcing a fixed fraction p by moving randomly selectedobservations in c ∗ from the too big group to the too small one—analogous to a cluster al-gorithm that forces certain cluster sizes. The R-code doing the simulation is available in truematch.r in package truecluster (Oehlschl¨agel, 2007a). ehlschl¨agel References
H. Akaike. Information theory and an extension of the maximum likelihood principle.In B.N. Petrov and F. C´aski, editors,
Second International Symposium on InformationTheory , pages 267–281, Budapest, 1973. Akademiai Kaid´o. Reprinted in
Breakthroughsin Statistics , eds Kotz, S. & Johnson, N.L. (1992), volume I, pp. 599–624. New York:Springer.H. Akaike. A new look at statistical model identification.
IEEE Transactions on AutomaticControl , 19:716–723, 1974.Bart Bakker and Tom Heskes. Model clustering and resampling, 2001. URL citeseer.ist.psu.edu/bakker00model.html .A. Ben-Hur, A. Elisseeff, and I. Guyon. A stability based method for discovering structurein clustered data.
Pac Symp Biocomputing , 7:6–17, 2002.Franois Bourgeois and Jean-Claude Lassalle. An extension of the munkres algorithm forthe assignment problem to rectangular matrices.
Communication ACM , 14(12):802–804,1971.L. Breiman. Bagging predictors.
Machine Learning , 24(2):123–140, 1996.Jacob Cohen. A coefficient of agreement for nominal scales.
Educational and PsychologicalMeasurement , 20:37–46, 1960.E. Dimitriadou, A. Weingessel, and K. Hornik. A combination scheme for fuzzy clustering.
Journal of Pattern Recognition and Artificial Intelligence , 16:901–912, 2002.S. Dolnicar and F. Leisch. Behavioural market segmentation using the bagged cluster-ing approach based on binary guest survey data: Exploring and visualizing unobservedheterogeneity.
Tourism Analysis , 5(2-4):163–170, 2000.S. Dudoit and J. Fridlyand. A prediction-based resampling method for estimating thenumber of clusters in a dataset.
Genome Biology , 3(7):research0036.1–0036.21, 2002.S. Dudoit and J. Fridlyand. Bagging to improve the accuracy of a clustering procedure.
Bioinformatics , 19(9):1090–1099, 2003.A. D. Gordon and M. Vichi. Fuzzy partition models for fitting a set of partitions.
Psy-chometrika , 66:229–248, 2001.Kurt Hornik. A CLUE for CLUster Ensembles.
Journal of Statistical Software , 14(12),September 2005. URL .Kurt Hornik and Walter Boehm. clue: Cluster ensembles , 2007. R package version 0.3-11.Lawrence Hubert and Phipps Arabie. Comparing partitions.
Journal of Classification , 2:193–218, 1985. ruecluster matching A. K. Jain and J. Moreau. Bootstrap techniques in cluster analysis.
Pattern Recognition ,20:547–568, 1988.H. W. Kuhn. The hungarian method for the assignment problem.
Naval Research LogisticsQuaterly , 2:225–231, 1955.Friedrich Leisch. Bagged clustering. Technical Report Working Paper 51, SFB AdaptiveInformation Systems and Modelling in Economics and Management Science, Vienna Uni-versity of Economics and Business Administration in cooperation with the University ofVienna, Vienna University of Technology., 1999.Stefano Monti, Pablo Tamayo, Jill Mesirov, and Todd Golub. Consensus clustering: Aresampling-based method for class discovery and visualization of gene expression mi-croarray data.
Machine Learning , 52:91–118, 2003.J. V. Moreau and A. K. Jain. The bootstrap approach to clustering. In P.A. Devijver andJ. Kittler, editors,
Pattern Recognition: Theory and Applications , volume 30 of
NATOASI Series F , pages 63–71. Springer, 1987.J. Munkres. Algorithms for the assignment and transportation problems.
J. Siam , 5:32–38,1957.Jens Oehlschl¨agel.
Truecluster: an algorithmic framework for robust and scalable clustering ,2007a. URL . R package version 0.3 (version 1.0 and higher willalso be hosted at
CRAN.R-project.org ).Jens Oehlschl¨agel. Truecluster: robust scalable clustering with model selection. submittedto jmlr , 2007b.W. M. Rand. Objective criteria for the evaluation of clustering methods.
Journal of theAmerican Statistical Association , 66:846–850, 1971.Volker Roth, Tilman Lange, Mikio Braun, and Joachim M. Buhmann. A resampling ap-proach to cluster validation. In Wolfgang H¨ardle and Bernd R¨onz, editors,
Proceedingsin Computational Statistics: 15th Symposium Held in Berlin (COMPSTAT2002) , pages123–128, Heidelberg, 2002. Physica-Verlag.G. Schwarz. Estimating the dimension of a model.
Annals of Statistics , 6:461–464, 1978.A. Strehl and J. Ghosh. Cluster ensembles — a knowledge reuse framework for combiningmultiple partitions.
Journal of Machine Learning Research , 3:583–617, 2002.Robert Tibshirani, Guenther Walther, David Botstein, and Patrick Brown. Cluster valida-tion by prediction strength. Technical report, Stanford University, 2001., 3:583–617, 2002.Robert Tibshirani, Guenther Walther, David Botstein, and Patrick Brown. Cluster valida-tion by prediction strength. Technical report, Stanford University, 2001.