KDD-SC: Subspace Clustering Extensions for Knowledge Discovery Frameworks
Stephan Günnemann, Hardy Kremer, Matthias Hannen, Thomas Seidl
KKDD-SC: Subspace Clustering Extensions forKnowledge Discovery Frameworks
Stephan Günnemann ◦• Hardy Kremer ◦ Matthias Hannen ◦ Thomas Seidl ◦◦ RWTH Aachen University, Germany • Carnegie Mellon University, USA {lastname}@cs.rwth-aachen.de [email protected]
ABSTRACT
Analyzing high dimensional data is a challenging task. Forthese data it is known that traditional clustering algorithmsfail to detect meaningful patterns. As a solution, subspaceclustering techniques have been introduced. They analyzearbitrary subspace projections of the data to detect cluster-ing structures.In this paper, we present our subspace clustering exten-sion for KDD frameworks, termed KDD-SC. In contrast toexisting subspace clustering toolkits, our solution neither isa standalone product nor is it tightly coupled to a specificKDD framework. Our extension is realized by a commoncodebase and easy-to-use plugins for three of the most pop-ular KDD frameworks, namely KNIME, RapidMiner, andWEKA. KDD-SC extends these frameworks such that theyoffer a wide range of different subspace clustering functional-ities. It provides a multitude of algorithms, data generators,evaluation measures, and visualization techniques specifi-cally designed for subspace clustering. These functionalitiesintegrate seamlessly with the frameworks’ existing featuressuch that they can be flexibly combined. KDD-SC is pub-licly available on our website.
1. INTRODUCTION
Clustering is one of the core data mining tasks. The goalof clustering is to automatically group similar objects whileseparating dissimilar ones. Traditional clustering methodsconsider all dimensions of the dataspace to measure the sim-ilarity between objects. For today’s high dimensional data,however, these full-space clustering approaches fail to detectmeaningful patterns since irrelevant dimensions obfuscatethe clustering structure [7, 16]. Using global dimensionalityreduction techniques such as principle components analysisis not sufficient to solve this problem: by definition, all ob-jects are projected to the same lower dimensional subspace.However, as Figure 1 illustrates, each cluster might havelocally relevant dimensions and objects can be part of mul-tiple clusters in different subspaces. These effects cannot becaptured by global dimensionality reduction approaches.To tackle this challenge, subspace clustering techniqueshave been introduced, aiming at detecting locally relevantdimensions per cluster [16, 21]. They analyze arbitrary sub-space projections of the data to detect the hidden clusters.Typical applications for subspace clustering include gene ex-pression analysis, customer profiling, and sensor networkanalysis. In each of these scenarios, subsets of the objects(e.g., genes) are similar regarding subsets of the dimensions(e.g., different experimental conditions).
Figure 1: Subspace clusters hidden in locally rele-vant subspace projections
Existing systems:
Today, general data mining functional-ity is provided to the end-user in a convenient and intuitiveway by established knowledge discovery frameworks as KN-IME (Konstanz Information Miner, [6]), RapidMiner [18],and WEKA (Waikato Environment for Knowledge Analy-sis, [14]). These systems are succesfully and frequently usedin research and practice. The applicability of subspace clus-tering, in contrast, is still limited.So far, there are two systems that support the user in thetask of subspace clustering, namely OpenSubspace [20] andELKI [1]. Both systems are milestones in the process ofproviding subspace clustering functionality to the end-user,but have severe limitations concerning their integration intoestablished data mining workflows. While ELKI, as a stand-alone java framework, does not offer any integration intoexisting data mining toolkits, OpenSubspace is highly cou-pled and its current form only applicable within the WEKAframework. Due to this strong coupling, it is difficult to in-tegrate new algorithms and to (re)use already implementedsubspace clustering functionality in other KDD frameworks.Accordingly, for end-users running their established KDDworkflows in other frameworks than WEKA or ELKI, theintegration of subspace clustering into these workflows is ahard and time-consuming challenge.
Our contribution:
In this paper, we propose a new sys-tem for subspace clustering, which is seamlessly integratedinto KNIME, RapidMiner, and WEKA. By covering thisbroad spectrum of knowledge discovery frameworks, manyresearchers and practitioners can benefit from our system.It is based on a common code basis across all KDD frame-works. Thus, it is possible to quickly deploy new subspaceclustering methods in multiple frameworks at the same time.By integrating our system into these established knowl-edge discovery frameworks, the user can easily use subspaceclustering functionality within the whole KDD process. Ourmethods can be combined with the existing algorithms, data a r X i v : . [ c s . D B ] J u l ransformations techniques, and visualization tools of theseframeworks. Overall, our system offers • a seamless integration of subspace clustering functional-ity into KNIME, RapidMiner, and WEKA. Accordingly,many researchers and practitioners can use their estab-lished KDD workflows without any loss in productivity. • a common code basis for subspace clustering algorithms,evaluation measures, and synthetic data generators. Itis independent of the chosen data mining framework andrealizes easy extensibility and reusability of all compo-nents. • visualization and interaction principles for subspace clus-tering exploiting the capabilities of the different datamining toolkits, which support the user in the interpre-tation of the obtained results.
2. GENERAL ARCHITECTURE
In this section we describe the general architecture andfunctionality of our subspace clustering extension. The us-age of our extension within the different knowledge discoveryframeworks is described in the Sections 3-5.For reusability and easy portability of the developed meth-ods, our KDD-SC framework is separated into a core pack-age (CoreSC) and packages realizing the integration into thedifferent KDD frameworks (KnimeSC, RapidSC, WekaSC).Figure 2 shows an overview of this design.
KnimeSC RapidSC WekaSCCoreSC - subspace clustering algorithms- data generators- evaluation measures- visualization
Figure 2: General Architecture of KDD-SC
In the core package, the actual functionality of our sys-tem is implemented. This functionality is independent of aspecific system. The core package is divided into four majorcomponents: subspace clustering algorithms, data genera-tors, evaluation measures, and visualization tools. A de-tailed description of these components is provided in thefollowing sections.The WekaSC user interface as well as parts of the corepackage (algorithms & evaluation measures) have been ex-tracted from the OpenSubspace project [20]. In contrast tothe original OpenSubspace, which was tightly bundled witha specific WEKA version, our redesigned WekaSC imple-ments the WEKA plugin interface and enables easy exten-sion by our component-based design.In the three packages KnimeSC, RapidSC, and WekaSCwe included the implementations which are necessary to re-alize an interaction of the knowledge discovery frameworkswith the core package. Thus, these packages act as adaptersbetween the core package and the actual system. In theKnimeSC package, for example, we implemented the node-based representation of the algorithms as required for theKNIME framework (cf. Section 3). By using a common codebase, i.e. the CoreSC package,it is easy to integrate new subspace clustering techniquesfor each of the knowledge discovery frameworks. The actualsubspace clustering algorithm has only to be implemented inthe core package. Additionally, one can easily support other(e.g., R) or even new data mining frameworks by simplyproviding a new adapter package.
The first component of the CoreSC package contains theactual subspace clustering algorithms. In our extension, theuser can select among a multitude of different algorithms.These algorithms include grid based clustering techniques(CLIQUE [3], DOC/FastDOC [23], MineClus [25], SCHISM[24]), DBSCAN-based techniques (FIRES [15], INSCY [5],SUBLCU [17]) and optimization-based techniques for sub-space clustering (PROCLUS [2], STATPC [19]).Each algorithm implements the interface
SubspaceAlgo-rithm , which defines the input and output of the algorithms.The input corresponds to a database of objects described bynumerical features, i.e. each algorithm needs to be providedwith a list of objects (cid:104) o , . . . , o n (cid:105) where o i ∈ R d . The outputof each algorithm is a list of subspace clusters (cid:104) C , . . . , C k (cid:105) .Each subspace cluster C i represents the objects and rele-vant dimensions belonging to this clusters. Note that insubspace clustering each cluster has its individual set of rel-evant dimensions (cf. Figure 1). Thus, each subspace clustercorresponds to a tuple C i = ( O i , S i ) where O i represents theclustered objects by their objects ids, i.e. O i ⊆ { , . . . , n } ,and S i represents the relevant dimensions of the cluster, i.e. S i ⊆ { , . . . , d } .It is worth mentioning that subspace clustering in generalis not restricted to disjoint clusters; thus, the result set mightcontain clusters C i and C j (with i (cid:54) = j ) where O i ∩ O j (cid:54) = ∅ or S i ∩ S j (cid:54) = ∅ . Additionally, dependent on the chosenalgorithm, not necessarily each object or dimension needsto be part of some cluster, i.e. it might hold (cid:83) ki =1 O i (cid:54) = DB or (cid:83) ki =1 S i (cid:54) = { , . . . , d } . The second component of the core package contains aflexible data generator first introduced in [13], which gen-erates synthetic data with hidden subspace clusters. Thesedatasets can be used to evaluate the correctness of subspaceclustering algorithms and to assess the methods’ scalability.The data generator implements the interface
SubspaceData-Generator which defines the two outputs of the generator.The first output corresponds to the generated data, i.e.as above it corresponds to a list of objects (cid:104) o , . . . , o n (cid:105) with o i ∈ R d . The second output of each data generator is theground truth clustering. This ground truth specifies whichclusters are hidden in the data and which clusters should befound by the subspace clustering algorithms. Accordingly,the second output is a list of subspace clusters (cid:104) C , . . . , C k (cid:105) . The third component provides implementations of evalua-tion measures for subspace clustering. Evaluation measuressummarize the clustering result by a numerical value where,e.g., a high value indicates better quality of the clustering.Evaluation measures can be categorized into internal mea-sures and external measures. While internal measures as- igure 3: Screenshot of the subspace clustering extension for KNIME (left: newly developed nodes; center:workflow; right: description of nodes) sess the quality of a clustering based on properties as, e.g.,the compactness or density, external measures compute thequality w.r.t. a ground truth clustering [8, 11]. Please notethat the ground truth clustering can be any clustering: ei-ther generated by a data generator, provided manually bythe user, or determined by an algorithm. Thus, besides com-paring the result of a single algorithm against the groundtruth, external measures can also be used to compare theresults of two different algorithms on the same data. Weprovide several evaluation measure specifically designed forsubspace clustering in our framework. These measure in-clude CE, RNIA, Entropy, F1P, F1R, and E4SC. We kindlyrefer to [11] for a description of these measures.In our extension, all evaluation measures implement theinterface
SCEvaluationMeasure . The interface specifies theinput of these measure which corresponds to the databaseon which the clustering is performed, and two subspace clus-tering results (cid:104) C , . . . , C k (cid:105) and (cid:104) C (cid:48) , . . . , C (cid:48) l (cid:105) . The output ofeach measure is a numerical value summarizing the qualityof the clustering. Since some measures provide more finegrained evaluation results for each cluster individually, weadditionally implemented the interface SCExtendedEvalua-tionMeasure . This interface allows to retrieve a evaluationresult for each cluster of the result individually.
The last component of the core package provides subspaceclustering specific visualization and interaction principles.In our extension we integrated the CoDA [10], MCExplorer[12], and Visa [4] toolkits. While these techniques are in-dependent of the used KDD framework, we additionally in-tegrated further techniques exploiting the individual visual-ization capabilities of each framework. These methods areintegrated in the framework-specific packages of KDD-SC.
3. KNIME EXTENSION
This section describes the usage of our extension withinKNIME, termed KnimeSC. We demonstrated a first versionof KnimeSC at [13]. KNIME is an opensource data min-ing framework offering several benefits and is widely beenused in industry as well as in academia. It has a mod-ern, user-friendly interface which allows to model data min-ing workflows in an intuitive manner. In KNIME, a work-flow is defined by a set of nodes, which can represent datasources and sinks, mining algorithms, transformations, vi-sualizations, and further concepts. Each node has specificinput and output ports depending on the node’s function-ality. The user establishes a new workflow by selecting aset of nodes from the node repository and then connects thecorresponding input and output ports to steer the data flow igure 4: Workflow to evaluate the result of a sub-space clustering algorithm w.r.t. a ground truthclustering (synthetically generated). between these nodes. Data mining workflows can be storedfor later re-use, modification, or extension.A major benefit of KNIME is the easy-to-use plugin con-cept. It allows KNIME to be extended by new features,represented as new nodes in the node repository. These newnodes can freely interact with the existing KNIME compo-nents, achieving a deep integration of our extension. Thus,all techniques already integrated in KNIME can be com-bined with our extension for mutual benefit.Figure 3 shows a screenshot of KNIME and our extension.On the left, the newly developed nodes are illustrated in thenode repository. Each node corresponds to one functionalityprovided by the CoreSC package. On the right, descriptionsof each node and its corresponding input/output ports aregiven. In the center, the actual workflow is illustrated. Inthe following we provide details of our extension and thedifferent types of nodes based on three different workflows.
Figure 3 shows a simple workflow where a data readernode (’Node 1’; here: reading data from an ARFF file) isconnected with a subspace clustering algorithm node (’Node2’; here: PROCLUS [2]). Accordingly, by specifying thisworkflow, the user applies PROCLUS on a given database.As described in Section 2.1, each algorithm gets as an in-put the database to be clustered. This is shown by the singleinput port of the node ’Node 2’. Considering the output ofthe node, we have to take care of the special format usedin KNIME. The standard format to exchange informationbetween nodes in KNIME is by using flat tables/relations.Since the output of each algorithm is a list of subspace clus-ters, which itself are tuples describing sets of objects and setsof dimensions, each node needs to have two output ports. Atthe first output port, a table is provided which describes therelevant dimensions S i of each cluster C i = ( O i , S i ) via bi-nary encoding. In Figure 3 this table is illustrated at the bot-tom (’Cluster Dimensions’) and shows three subspace clus-ters found in the Iris dataset [9]. The cluster with ID 2, forexample, is located in the dimensions ’sepallength’, ’sepal-width’, and ’petalwidth’. The second output port providesinformation which objects belong to the detected clusters(table ’Cluster Objects’). In the example, the object 149belongs to cluster 0, while object 17 belongs to cluster 1.Please note again that in subspace clustering each objectmight belong to multiple clusters, i.e. clusters might over- Figure 5: Workflow to visualize the result of a sub-space clustering algorithm via colored tables. lap due to different subspace projections. Thus, the outputtable corresponds to an n : m relation. These two outputscan be forwarded to any other node included in the KNIMEframework as we will show next. A second workflow is illustrated in Figure 4. It models thetask frequently performed in scientific literature: a) gener-ate synthetic data with a given clustering ground truth, b)apply an algorithm on the data, and c) measure whether thedetected result matches the ground truth.To solve this task with our framework, the user has to se-lect a data generator node (’Node 1’). The node constructssynthetic data where the subspace clustering structure isknown, i.e. the ground truth for clustering is given. Again,we have to take care that in KNIME the information is ex-changed via flat tables. Thus, each data generator node hasthree output ports: First, the generated data. Second, therelevant dimensions of each cluster. Third, the cluster mem-berships of each object. The last two outputs are of the sameformat as the outputs of the subspace clustering algorithmnodes as described above. Connecting the first port of the igure 6: Screenshot of the subspace clustering extension for RapidMiner. (left: newly developed nodes;center: workflow with chained evaluation measures; right: parametrization and description of nodes) data generator with an algorithm node (’Node 2’) allows tocluster the synthetic data.Finally, to measure the quality of the detected results, theuser can use an evaluation measure node (’Node 3’; here: CEmeasure [22]). Such a node has five input ports: four portsare required to specify the two clustering results that shouldbe compared (two ports for each clustering result), and oneport for the database. Thus, in the figure, all three outputports of the data generator are connected to the measure aswell as the two output ports of the MineClus node.
The two outputs of each algorithm node already allow toanalyze the detected clustering structure on a basic level.That is, by inspecting the corresponding tables (cf. Fig. 3)the user might get an impression about the relevant dimen-sions of the clusters and the supporting objects. Though, an-alyzing these tables might be difficult to gain further knowl-edge; accordingly, for easy interpretations of the clusteringresults we include different visualizations.One possible visualization is realized with the workflowdepicted in Figure 5. The subspace visualization node gen-erates results as shown in the table on the right. The tablerepresents the original database where each row correspondsto one object. The objects belonging to the same cluster arehighlighted with the same color. In the example, three clus-ters are shown. Additionally, also the relevant dimensionsof the clusters are depicted. A dimension is relevant, if andonly if there is a colored bar on the right hand side of thenumber. In the example, the green cluster is located in thesubspace of dimension 2 and 4, while the blue cluster is lo-cated in all four dimensions. Using this visualization, the user can easily compare the different subspaces of the clus-ters as well as the attribute values of the clustered objects.Considering for example the green cluster, we see that theattribute values in the first (and irrelevant) dimension aredistributed in the broad range of 5.0-6.3, while the second(and relevant) dimension shows a deviation of only 2.0-2.4.To obtain this visualization, the subspace visualizationnode requires three inputs: First, the database to be an-alyzed. Next, the relevant dimensions of each cluster withtheir corresponding coloring. This coloring is realized by us-ing the Color Manager node provided by the KNIME frame-work. That is, the first output of the subspace clustering al-gorithm (’Node 2’) is firstly forwarded to the Color Manager(’Node 3’) before used as an input of the subspace visualiza-tion node (’Node 4’). In the Color Manager node, the usercan choose the color of each cluster. The last input requiredfor the visualization is the cluster membership informationwhich can be directly transferred from ’Node 2’.
4. RAPIDMINER EXTENSION
In this section we present the usage of our extension withinthe RapidMiner framework. Similar to KNIME, RapidMinermodels data mining workflows via a node-based interface,i.e. each node performs a certain task and has specific inputand output ports. Information between different nodes isexchanged by connecting their corresponding ports.Figure 6 shows a screenshot of RapidMiner and our exten-sion. On the left of the screen, the newly developed nodesare shown. On the right, the parametrization of the cur-rently selected node is illustrated (here: the parameters ofthe MineClus algorithm) and a description of the node isprovided. In the center, one sees the actual workflow.hile the general interaction with RapidMiner is simi-lar to KNIME, we briefly discuss some differences. The ex-change of information between KNIME nodes is based on flattables. Thus, we represented a subspace clustering resultsvia two flat tables, describing the object groupings and therelevant dimensions of the clusters. In RapidMiner, infor-mation between nodes is exchanged based on Java objects.Thus, instead of using flat tables, we directly exchange thelist of subspace clusters via the Java class
SubspaceCluster-Model . Accordingly, in RapidMiner each node representing asubspace clustering algorithm has only a single output port(cf. the MineClus node in Figure 6) and each data generatornode has only two outputs (one port for the ground truthclustering and the other port for the generated database).These output ports are typed, i.e. they can only be con-nected to other input ports which also accept subspace clus-tering results.In Figure 6, we see how the ground truth clustering of thedata generator and the result of MineClus are forwarded tothe evaluation measure E4SC. Additionally, the generateddatabase is provided as an input for the measure. The fourthinput port shows a further feature integrated in RapidMiner:the chaining of nodes. Here, the measures E4SC and CEare chained, i.e. all output ports of E4SC act as input portsfor CE. While the first three output ports simply forwardthe three input ports of the measure, the last output portrepresents the result of the evaluation measure and of allmeasures which are before this node in the chain. That is,in the workflow of Figure 6, the result of the CE measure is alist representing the result of the E4SC and the CE measure.By chaining the nodes the workflows are more compact and,thus, easier understandable.
In Figure 7 we illustrate another RapidMiner workflowmodeling the analysis of subspace clustering results via theCoDA [10] and MCExplorer [12] toolkits. After reading datafrom an external source, the database is forwarded to thePROCLUS node. The clustering result of PROCLUS andthe database are then transferred to the visualization nodeshown on the right. Please note that in RapidMiner we haveto use a so called ’Multiply’ node when a single output portneeds to be connected to multiple input ports. In the exam-ple, the loaded database is used as an input for PROCLUSas well as for the visualization. When activating the node onthe right, a new window containing the CoDA and McEx-plorer toolkits will open in which the user can interact withthe clustering result. A detailed description of the toolkits’functionalities is given in the original papers.
5. WEKA EXTENSION
Finally, our extension is integrated into WEKA as shownin Figure 8. The functionality and user interface of WekaSCcorrespond to OpenSubspace [20]. As already mentioned,the advantage introduced by our redesign is its implementa-tion of the WEKA plugin interface and its easy extensibilityby the component-based design.The classical workflow to analyze data in WEKA dif-fers from the previous two knowledge discovery frameworks.It represents primarily a sequential process where a singledataset is loaded, preprocessed, and finally analyzed by analgorithm. The loading and preprocessing functionality ofWEKA is integrated into the ’Preprocess’ tab as shown in
Figure 7: RapidMiner workflow for applying theCoDA & MCExplorer visual analysis of a subspaceclustering result.
Figure 8. When integrating our extension into WEKA, threenovel tabs appear.The ’Subspace Clustering’ tab provides the major func-tionality of our extension. Here, the preprocessed data isanalyzed using subspace clustering methods. The user canselect among the multitude of implemented subspace clus-tering algorithms. In the example, the PROCLUS method ischosen. Additionally, the user can select different evaluationmeasures which will be applied when the result of the algo-rithm has been generated. As shown in the lower left partof the screenshot, different measures can be selected and theground truth clustering to which the result is compared canbe loaded. After starting the algorithm, the clustering resultwill appear in the right part of the window. It represents thelist of detected subspace clusters with their relevant dimen-sions in binary encoding as well as the number of objects percluster and the corresponding object ids. This textual out-put corresponds to the two tables as used in KNIME. Thetwo remaining tabs ’CoDA’ and ’MCExplorer’ can be usedto analyze the clustering results based on the correspondingtoolkits as described before.
6. CONCLUSION
Subspace clustering is an important mining task and it iswidely studied in the scientific community. In this paper,we presented our subspace clustering extension KDD-SC,which is integrated in the KDD frameworks KNIME, Rapid-Miner, and WEKA. Our extension provides subspace clus-tering functionality for these frameworks based on a commoncode basis, and it can flexibly be combined with the toolk-its’ existing features. Our KDD-SC extension is publiclyavailable on the following website: http://dme.rwth-aachen.de/KDD-SC
Overall, our extension sets the stage for the wide applicabil-ity of subspace clustering in practical applications. igure 8: Screenshot of the subspace clustering extension for WEKA. Shown are the results of a PROCLUSrun and the possible Evaluation Measures that can be applied.
Acknowledgment.
We thank Emmanuel M¨uller, Ira Assent,and Timm Jansen for their excellent work on the OpenSub-space project, which is a foundation of CoreSC and WekaSC.
7. REFERENCES [1] E. Achtert, H.-P. Kriegel, and A. Zimek. ELKI: Asoftware system for evaluation of subspace clusteringalgorithms. In
SSDBM , pages 580–585, 2008.[2] C. C. Aggarwal, C. M. Procopiuc, J. L. Wolf, P. S. Yu,and J. S. Park. Fast algorithms for projectedclustering. In
ACM SIGMOD , pages 61–72, 1999.[3] R. Agrawal, J. Gehrke, D. Gunopulos, andP. Raghavan. Automatic subspace clustering of highdimensional data for data mining applications. In
ACM SIGMOD , pages 94–105, 1998.[4] I. Assent, R. Krieger, E. M¨uller, and T. Seidl. Visa:visual subspace clustering analysis.
ACM SIGKDDExplorations Newsletter , 9(2):5–12, 2007.[5] I. Assent, R. Krieger, E. M¨uller, and T. Seidl. INSCY:Indexing subspace clusters with in-process-removal ofredundancy. In
IEEE ICDM , pages 719–724, 2008.[6] M. R. Berthold, N. Cebron, F. Dill, T. R. Gabriel,T. K¨otter, T. Meinl, P. Ohl, C. Sieb, K. Thiel, andB. Wiswedel. KNIME: The Konstanz InformationMiner. In
Studies in Classification, Data Analysis, andKnowledge Organization . Springer, 2007. [7] K. Beyer, J. Goldstein, R. Ramakrishnan, andU. Shaft. When is nearest neighbors meaningful. In
ICDT , pages 217–235, 1999.[8] I. F¨arber, S. G¨unnemann, H.-P. Kriegel, P. Kr¨oger,E. M¨uller, E. Schubert, T. Seidl, and A. Zimek. Onusing class-labels in evaluation of clusterings. In
MultiClust Workshop at SIGKDD , 2010.[9] A. Frank and A. Asuncion. UCI machine learningrepository. http://archive.ics.uci.edu/ml, 2010.[10] S. G¨unnemann, I. F¨arber, H. Kremer, and T. Seidl.CoDA: Interactive cluster based concept discovery.
PVLDB , 3(1-2):1633–1636, 2010.[11] S. G¨unnemann, I. F¨arber, E. M¨uller, I. Assent, andT. Seidl. External evaluation measures for subspaceclustering. In
ACM CIKM , pages 1363–1372, 2011.[12] S. G¨unnemann, H. Kremer, I. F¨arber, and T. Seidl.MCExplorer: interactive exploration of multiple(subspace) clustering solutions. In
IEEE ICDMWorkshops , pages 1387–1390. IEEE, 2010.[13] S. G¨unnemann, H. Kremer, R. Musiol, R. Haag, andT. Seidl. A subspace clustering extension for theKNIME data mining framework. In
IEEE ICDMWorkshops , pages 886–889, 2012.[14] M. Hall, E. Frank, G. Holmes, B. Pfahringer,P. Reutemann, and I. H. Witten. The WEKA datamining software: an update.
ACM SIGKDDExplorations , 11(1):10–18, 2009.15] H.-P. Kriegel, P. Kr¨oger, M. Renz, and S. H. R.Wurst. A generic framework for efficient subspaceclustering of high-dimensional data. In
IEEE ICDM ,pages 250–257, 2005.[16] H.-P. Kriegel, P. Kr¨oger, and A. Zimek. Clusteringhigh-dimensional data: A survey on subspaceclustering, pattern-based clustering, and correlationclustering.
ACM TKDD , 3(1), 2009.[17] P. Kr¨oger, H.-P. Kriegel, and K. Kailing.Density-connected subspace clustering forhigh-dimensional data. In
SDM , 2004.[18] I. Mierswa, M. Wurst, R. Klinkenberg, M. Scholz, andT. Euler. YALE: Rapid prototyping for complex datamining tasks. In
ACM SIGKDD , pages 935–940, 2006.[19] G. Moise and J. Sander. Finding non-redundant,statistically significant regions in high dimensionaldata: a novel approach to projected and subspaceclustering. In
ACM SIGKDD , pages 533–541, 2008. [20] E. M¨uller, S. G¨unnemann, I. Assent, and T. Seidl.Evaluating clustering in subspace projections of highdimensional data.
PVLDB , 2(1):1270–1281, 2009.[21] L. Parsons, E. Haque, and H. Liu. Subspace clusteringfor high dimensional data: a review.
ACM SIGKDDExplorations , 6(1):90–105, 2004.[22] A. Patrikainen and M. Meila. Comparing subspaceclusterings.
IEEE Trans. Knowl. Data Eng. ,18(7):902–916, 2006.[23] C. M. Procopiuc, M. Jones, P. K. Agarwal, and T. M.Murali. A monte carlo algorithm for fast projectiveclustering. In
ACM SIGMOD , pages 418–427, 2002.[24] K. Sequeira and M. J. Zaki. SCHISM: A newapproach for interesting subspace mining. In
IEEEICDM , pages 186–193, 2004.[25] M. L. Yiu and N. Mamoulis. Frequent-pattern basediterative projected clustering. In