[PDF] On interestingness measures of formal concepts

Abstract

Formal concepts and closed itemsets proved to be of big importance for knowledge discovery, both as a tool for concise representation of association rules and a tool for clustering and constructing domain taxonomies and ontologies. Exponential explosion makes it difficult to consider the whole concept lattice arising from data, one needs to select most useful and interesting concepts. In this paper interestingness measures of concepts are considered and compared with respect to various aspects, such as efficiency of computation and applicability to noisy data and performing ranking correlation.

Full PDF

OOn interestingness measures of formal concepts

Sergei O. Kuznetsov, Tatiana Makhalova

National Research University Higher School of EconomicsKochnovsky pr. 3, Moscow 125319, Russia [email protected], [email protected]

Abstract.

Formal concepts and closed itemsets proved to be of big im-portance for knowledge discovery, both as a tool for concise represen-tation of association rules and a tool for clustering and constructingdomain taxonomies and ontologies. Exponential explosion makes it diﬃ-cult to consider the whole concept lattice arising from data, one needs toselect most useful and interesting concepts. In this paper interestingnessmeasures of concepts are considered and compared with respect to vari-ous aspects, such as eﬃciency of computation and applicability to noisydata and performing ranking correlation.

Formal Concept Analysis intrestingess measures closed itemsets

Formal concepts play an important role in knowledge discovery, since they can beused for concise representation of association rules, clustering and constructingdomain taxonomies, see surveys [40, 41, 34]. Most of the diﬃculties in the appli-cation of closed itemsets (or, a more common name, attribute sets) to practicaldatasets are caused by the exponential number of formal concepts. It compli-cates both the process of the model construction and the analysis of the resultsas well. Recently, several approaches to tackle these issues were introduced. For adataset all closed sets of attributes ordered by inclusion relation make a lattice,below we use the terminology of concept lattices and Formal Concept Analy-sis [21]. In terms of FCA, a concept is a pair consisting of an extent (a closed setof objects, or transactions in data mining terms) and an intent (the respectiveclosed set of attributes, or itemset).We propose to divide existing approaches to simplifying concept mining intothe groups presented in Figure 1. Methods of the ﬁrst group compute the conceptlattice built on simpliﬁed data. The most general way to get smaller lattices isto reduce the size of datasets preserving the most important information. Forthis purpose such methods as Singular Value Decomposition [15], Non-negativeMatrix Decomposition [44] and more computationally eﬃcient k-Means [16],fuzzy k-Means [18], agglomerative clustering of objects based on a similarityfunction of weighted attributes [17] are used. Another way to reduce the latticesize was proposed in [19]. This approach aims at the reduction in the number ofincomparable concepts by making slight changes of the context. a r X i v : . [ c s . A I] A p r pproaches to concept mining simpliﬁcationPreprocessing:modiﬁcationof the original data Computinga subset of concepts Postprocessing:application of indicesto conceptsDimensionreduction Changing ofdata entries Application ofan antimonotonicfunction Modiﬁcationof a closureoperatorComputing:approximateconcepts (clusters) Fig. 1: Classiﬁcation of methods for concept mining simpliﬁcationComputing approximate concepts, or so-called bi-/triclusters, is becomingincreasingly more ubiquitous nowadays. In general, biclustering refers to per-forming simultaneous row–column clustering of real-valued data. Biclusters itselfcan be deﬁned in diﬀerent ways [36], e.g. as a submatrix with constant values,with constant values on rows or columns, with coherent values, etc. In the caseof binary data bicluster is deﬁned as a submatrix with high density (i.e., the pro-portion of 1s). In [24] authors presented a set of evaluation criteria for triclusters,which are density, coverage, diversity, noise tolerance and cardinality. As it wasshown in experiments, the optimality by some criteria imposes non-optimalityw.r.t. other criteria and the optimization by a particular criteria underlies aclustering algorithm itself. Put diﬀerently, in the case of multimodal clusteringthe choice of criteria for cluster evaluation deﬁnes the choice of a clusteringalgorithm rather than selection of a subset of computed clusters.Methods from “Computing a subset of concepts” class aim at simplifyingthe analysis of a concept lattice by computing a subset of concepts. It can bedone, e.g., by deﬁning an antimonotonic function w.r.t. the size of concept extent.Computing concepts with extents exceeding a threshold was proposed in [35] andstudied in relation to frequent itemsets mining in [45], where the authors proposean algorithm for building iceberg lattices. In this case some interesting rareconcepts can be discarded, thus, it yields fewer results. From this perspective,the most promising approach is applying ∆ -stability. Σoϕια algorithm (Soﬁa) forcomputing a subset of most ∆ -stable concepts was proposed in [10]. ∆ -stabilitywill be considered in Section 3.The most well-reasoned way to build a subset of concepts is to modify theclosure operator and to restrict the creation of new concepts by involving back-ground knowledge. In [8] attribute-dependency formulas (AD-formulas) were in-troduced. The authors deﬁne mandatory attributes for particular attributes of acontext. If an attribute is included in a closed attribute set without its manda-tory attributes the concepts will not be generated. In [5] an analogous approachbased on weights of attributes was proposed. The interestingness of concepts isestimated using an aggregation function (average, maximum or minimum), whichis applied to a generator, a minimal generator, or to an intent. The authors alsonote the correspondence between the numerical approach and AD-formulas.ome restricted closure operators naturally arise from analyzed data andsubject area. Carpineto and Romano [13] considered the document–term rela-tion, where objects are documents and terms used in documents are attributes.The authors proposed to modify the closure operator using a hierarchy of termsas follows: two diﬀerent attributes are considered as equivalent if they have acommon ancestor that is a more general term.In the last decade some polynomial-time algorithms for computing Galoissub-hierarchies were proposed, see [9, 2].The main idea of methods from the latter group is to assess interestingnessof concepts by means of interestingness indices. This approach does not havedisadvantages of the previously described methods, namely, getting concepts ap-proximately corresponding to original objects or attributes, missing interestingconcepts due to early termination of an algorithm, a costly preprocessing proce-dure requiring the involvement of domain experts or another reliable source ofbackground knowledge. This class, however, has its own drawbacks, for example,the exponential complexity of the lattice computation can be aggravated by highcomplexity of index computation. The most promising approach from this pointof view is to compute a subset of concepts using the idea of antimonotonicityof an index. When the index value is antimonotonic w.r.t. the order relation ofthe concept lattice, one can start computing from the top concept and proceedtop-down until the concepts with the lowest possible value given by a threshold.In this paper we focus on a thorough study of concept indices. We leave sta-tistical signiﬁcance tests of concepts beyond this study, some basic informationon statistical approaches to itemsets mining can be found in [48]. The paperis naturally divided into three parts. First, we investigate the main features ofindices and discuss the intuition behind them, as well as their applicability inpractice with regard to their computation cost. Second, we provide a frameworkfor the development of new indices and give detailed information on the basicmetrics and operations that can be used to create new indices. Third, we describethe results of the comparative study of indices regarding the following aspects:estimation of interestingness concepts, approximation of intractable indices andnon-sensitivity to noise.The rest of the paper is organized as follows. Section 2 brieﬂy introducesthe main deﬁnitions of Formal Concept Analysis. Section 3 is devoted to the de-scription of indices. The main index features are given in Section 3.1. Section 3.2provides the basic information on indices and respective formulas. In Section 3.3we use the known indices and their features to build a concept lattice, then wereveal the most interesting concepts (i.e. groups of indices) w.r.t. certain indices.In Section 4 we propose guidelines for the development of new indices based onindices for arbitrary sets of attributes. We discuss the approaches to measuringinterestingness and provide the basic metrics and operations that can be used forindex construction. Section 5 focuses on a comparative study of the indices w.r.t.the most important tasks: selection of interesting concepts (Section 5.1), approx-imation of intractable indices (Section 5.2) and noise reduction (Section 5.3). InSection 6 we conclude and discuss the future work. Formal Concept Analysis: Basic Deﬁnitions

Here we brieﬂy recall the main deﬁnitions of FCA [21]. Given a ﬁnite set ofobjects G and a ﬁnite set of attributes M , we consider an incidence relation I ⊆ G × M so that ( g, m ) ∈ I if object g ∈ G has attribute m ∈ M . A formalcontext is a triple ( G, M, I ). The derivation operators ( · ) are deﬁned for A ⊆ G and B ⊆ M as follows: A = { m ∈ M | gIm for all g ∈ A } B = { g ∈ G | gIm for all m ∈ B } A is the set of attributes common to all objects of A and B is the set ofobjects sharing all attributes from B . The double application of ( · ) is a closureoperator, i.e., ( · ) is extensive, idempotent and monotone. Sets A ⊆ G , B ⊆ M ,such that A = A and B = B are said to be closed.A (formal) concept is a pair ( A, B ), where A ⊆ G , B ⊆ M and A = B , B = A . A is called the (formal) extent and B is called the (formal) intent ofthe concept ( A, B ).A concept lattice (or Galois lattice) is a partially ordered set of concepts,the order is deﬁned as follows: ( A, B ) ≤ ( C, D ) iﬀ A ⊆ C ( D ⊆ B ), a pair( A, B ) is a subconcept of (

C, D ) and (

C, D ) is a superconcept of (

A, B ). Eachﬁnite lattice has the highest element with A = G , called the top element, andthe lowest element with B = M , called the bottom element. a a a a a a a a a a a stability × × × ∆ l × × × × × × × × × ∆ h (collapseindex) × × × × × × × × × stab NOE × × × × × × × × stab OE × × × × × × × × stab OIE × × × × × × × × × robustness × × × × × × × × a : (designed) for closed subsets, a : not applica-ble to arbitrary subsets, a : based on comparisonto other attribute subsets, a : based on comparisonto neighboring attribute subsets, a : size-based, a :monotonic–antimonotonic w.r.t. order on attributesets, a : using a tuning parameter, a : polynomialcomplexity, a : linear complexity, a : cubic andhigher complexity, a : computable in one-pass overdata stability, ∆ l , ∆ h , stab NOE ,stab OE ,stab OIE , robustness { a ,a } stability,robustness, { a ,a ,a } ∆ l , ∆ h , stab NOE ,stab OE ,stab OIE , robustness a ,a ,a ,a ,a robustness a ,a ,a ,a ,a ,a ,a ,a ∆ l , ∆ h , stab OIE ,robustness a ,a ,a ,a ,a ,a ∆ l , ∆ h , stab NOE ,stab OE ,stab OIE a ,a ,a ,a ,a ,a ,a ,a ∆ l , ∆ h ,stab OIE a ,a ,a ,a ,a ,a ,a ,a ,a – a ,a ,a ,a ,a ,a ,a ,a ,a a Fig. 2: Formal context of subsampling-based indices (on the left) and the corre-sponding concept lattice (on the right) xample 1

Let us consider a formal context given in Fig. 2. Group ofsubsampling-based indices for formal concepts is a set of objects, their essentialfeatures are attributes. The corresponding concept lattice (see Fig. 2, on theright) consists of 8 concepts.

The application of indices oﬀers advantages over other approaches to reducing ofthe number of analyzed concepts. For example, it provides a thorough study of aconcept lattice without recomputing the lattice and ensures easily interpretableresults by preserving objects and attributes corresponding exactly to the originaldata. However, the exponential time complexity of lattice computation makesattracts attention to antimonotonic indices, since they allow generating only asubset of formal concepts.In this section we propose a hierarchical clustering of the existing indices forclosed set of attributes using the FCA-based approach. We describe the mostimportant index features and use them to build a concept lattice of indices: weconsider a formal context of indices and their features, and then compute theordered set of formal concepts. Upon that we apply some of the described indicesto identify the most interesting (top-ranked) concepts.

Some of the indices originally designed to assess concepts ( “designed for closedsubsets” ) can not be applied to an arbitrary set of attributes ( “not applicableto arbitrary subsets” ), whereas others were originally designed as measures forarbitrary attribute sets. In our study we consider only two indices for arbitrarysets of attributes, i.e. support/frequency, due to their importance in conceptmining.There exist indices which deal with each attribute of an intent (or a wholecontext) separately and assess the contribution of these attributes to a concept( “using all of the single attributes of a context” ). Another essential feature ofthe concept indices is the use of other concepts ( “based on comparison to otherattribute subsets” ). The concepts are selected based on a particular conditionor index values, i.e., frequent, (dis)similar to some other concepts, etc. In thiscase an index is comprised of relative values or diﬀerences between absolutevalues of the selected concepts and the assessed concept. We additionally addonly one predicate “is neighbor” ( “based on comparison to neighboring attributesubsets” ), since the direct neighbors of a concept are highly important both forcomputing closed itemsets and for concept mining.As it was noted previously, support has a special place in concept mining.We deﬁne “support-based (for subsets of attributes)” and “support-based (forsingle attributes)” features to characterize whether an index is based on support.Support may be regarded as probability of an attribute or a set of attributes,along with it we also consider conditional probability ( “using conditional (joint)robability” ). “Size-based” indices use the size of an intent or an extent, the totalnumber of attributes or objects, thus, they do not require to assess particularelements of a context.As it was mentioned before, the antimonotonicity is one of the most im-portant index properties, since it allows one to solve the problem of exponentialcomplexity of algorithms for generating concepts. The time complexity is referredthrough the following attributes: “polynomial complexity” , “sublinear complex-ity” , “linear complexity” , “quadratic complexity” , “cubic and higher complexity” .These groups are deﬁned w.r.t. the size of a context, i.e., | G || M | under the as-sumption that the lattice has been computed. The possibility to compute indicesin “one-pass over data” is relevant to complexity problem, but mostly it refersto the memory complexity and analysis of streaming data.Some indices have a tuning parameter ( “using a tuning parameter” ) that ei-ther changes the concept rankings or is used as a binarization threshold ( “booleanvalue” ). In our study we use the indices described below.

Stability

Stability indices for formal concepts were introduced in [30, 31] andmodiﬁed in [33]. For a formal concept (

A, B ) the (integral) intentional stabilityindex is the probability that B will remain closed when removing a subset ofobjects from extent A with equal probability.The intentional stability is deﬁned as follows: Stab i ( A, B ) = | { C ⊆ A | C = B } | | A | Intentional stability measures overﬁtting, i.e., the dependence of an intent(i.e.,a closed pattern) on observations (objects of the respective extent). Exten-sional stability is deﬁned dually.In [11] it was noted that stability indices are antimonotonic w.r.t. chain ofprojections. This class of antimonotonicity contains previously known classes [46]and allows one to generate k most stable concepts with delay polynomial in k and the input size. Estimates of stability

Logarithmic Stability

The problem of computing stability is P -complete [31],which hampers its application in practice where one usually faces large-sizedcontexts. In [4] it was proposed to use Monte Carlo approximation of stability,in [12] combination of Monte Carlo and an upper-bound estimate is described.The authors use the logarithmic scale of stability (the concept rankings remainthe same): Stab ( c ) = − log (1 − Stab ( c ))It allows one to deal with the problem described in [25]: closeness of stabilityvalues to 1 for large contexts.The bounds of stability are given by ∆ min ( c ) − log ( | M | ) ≤ − log X d ∈ DD ( c ) − ∆ ( c,d ) ≤ LStab ( c ) ≤ ∆ min ( c ) , where ∆ min ( c ) = min d ∈ DD ( c ) ∆ ( c, d ), DD ( c ) is the set of all direct descendantsof c in a lattice and ∆ ( c, d ) is the size of the set-diﬀerence between extents offormal concepts c and d .In [32] more precise upper bounds of the logarithmic stability were described.These estimates are based on two direct neighbors. The formulas are given below. Max-disjoint-extents upper bound (stab NOE ) The estimate uses two lower neigh-bors, one of them has the maximal extent among all the lower neighbors, thesecond one has the maximal extent among the rest of neighbors and does notshare elements with the ﬁrst one:

LStab ( c ) ≤ ∆ min ( c ) + ∆ ( ∅ ) min ( c, ∆ min ( c ))where ∆ min ( c ) = min d ∈ DD ( c ) ∆ ( c, d ), ∆ ( ∅ ) min ( c, ∆ min ( c )) = d + d , where d = min d ∈ DD ( c ) ∆ ( c, d ) ,d = min d ∈ DD { ∆ ( c, d ) | d ∩ d = ∅} . Max-distinguished-extents upper bound (stab OE ) To compute this index the ﬁrstneighbor is selected in the same way, while the second one meets the followingcondition: its extent has the maximal number of objects not included in theextent of the ﬁrst one.

LStab ( c ) ≤ | d | + | d | − | d ∩ d | where d = min d ∈ DD ( c ) (cid:8) ∆ ( c, d ) | d − d = max d ∗ ∈ DD ( c ) d ∗ − d (cid:9) and d = min d ∈ DD ( c ) ∆ ( c, d ). Max-extent upper bound (stab OIE ) The estimate takes into account two diﬀer-ent maximal extends:

LStab ( c ) ≤ | d | + | d | − | d ∩ d | where d = min d ∈ DD ( c ) ( c, d ), d = min d ∈ DD ( c ) and d = d . But such a“greedy” strategy can give an underestimated upper bound. tability Indices The notion of stability was revised and considered with re-gard to estimating concept-based hypotheses in [31]. Level-wise stability in-dices are studied. For formal concept c = ( A, B ) stability index of the j thlevel (2 ≤ j ≤ n −

1) is deﬁned as follows: J j ( c ) = γ j ( c ) (cid:18) n j (cid:19) , where n = | A | , γ j ( c ) = | { Y ⊂ A | | Y | = j, Y = B } | .Integral stability index is deﬁned as J Σ ( c ) = P n − i =2 J i .In this study we also consider integral stability indices of the j th level (2 ≤ j ≤ n − J Σ j ( c ) = P ji =2 J i – Major-set-based integral stability: J Σ j ( c ) = P n − i = j J i .Integral stability of the j th level may be regarded as the approximation ofthe stability index. The survey of the problem of best approximation is given inSection 5. Robustness

Robustness [46] of an (arbitrary) set of attributes of a dataset hasbeen introduced to estimate the probability that a pattern would still be gen-erated if some transactions (rows) of the dataset are removed. To compute thisindex one needs to generate 2 | G | subsamples, where G is the set of transactions(objects). Some classes of itemsets allow computing this index using an exactformula instead of subsampling the data. Here we use the formula for closedsets of attributes. For formal concept ( A, B ) this is the probability that B willremain closed when removing an object (row) from extent A with probability1 − α . Hence, stability index can be considered as an instantiation of robustnessfor α = 0 .

5. So, for formal concept c = ( A, B ) the robustness is given as follows: r ( c, α ) = X d ≤ c ( − | B d |−| B c | (1 − α ) | A c |−| A d | Proposition 1.

Stability of concept ( A, B ) is equal to its robustness for α = 0 . . Proof

In [43] it was noted that stability can be computed recursively by the traversalof covering relation (i.e., graph of the diagram) of the concept lattice from thebottom concept upwards as follows:

Stab ( A, B ) = | { C ⊆ A | C = B } | | A | = σ ( A, B )2 | A | , where σ ( A, B ) = 2 | A | − P ( C,D ) < ( A,B ) σ ( C, D ).In [4] the following relation of stability to M¨obius function µ [22] was shown: | { C ⊆ A | C = B } | = X ( C,D ) ≤ ( A,B ) | C | µ (( C, D ) , ( A, B ))Using the M¨obius function of the concept lattice, the formula takes the fol-lowing form: σ ( A, B ) = P ( C,D ) ≤ ( A,B ) | C | µ (( C, D ) , ( A, B )) and stability canlternatively be represented as

Stab ( A, B ) = X ( C,D ) ≤ ( A,B ) | C |−| A | µ (( C, D ) , ( A, B )) . Robustness of concept (

A, B ) is computed as P | A | k =0 a k (1 − α ) k , where a k = { X ( C,D ) ≤ ( A,B ) e ( D, B ) | | A | − | C | = k } , and e ( D, B ) = (cid:26) D = B − P ( C,D ) < ( E,F ) ≤ ( A,B ) e ( F, B ) otherwise.This formula gives M¨obius function, thus robustness can be rewritten as r (( A, B ) , α ) = X ( C,D ) ≤ ( A,B ) µ (( C, D ) , ( A, B )) (1 − α ) | A |−| C | . Replacing 1 − α by 0 . (cid:3) Concept Probability

In [29] it was noticed that some interesting concepts withthe small number of objects (i.e., small extents) usually have low stability values.The concept probability was proposed to get rid of this bias. The deﬁnition ofconcept probability from [29] is equivalent to the concept probability introducedearlier by R. Emilion [20].The probability that an arbitrary object has all attributes from set B isdeﬁned as follows: p B = Y m ∈ B p m Concept probability is deﬁned as the probability of B being closed: p ( B = B ) = n X k =0 p ( | B | = k, B = B ) = n X k =0 " C nk p kB (1 − p B ) n − k Y m/ ∈ B (cid:0) − p km (cid:1) where n = | G | .Concept probability aggregates three probabilistic components: the occur-rence of each attribute from B in all k objects, the absence of at least oneattribute from B in other objects and the absence of other attributes shared byall k objects. eparation The index was introduced in [29] to estimate the speciﬁcity of theobject–attribute relation of a concept with respect to the formal context. It isdeﬁned as a part of the area covered by a formal concept among all nonzeroelements in the rows and columns corresponding to the formal concept. s ( A, B ) = | A || B | P g ∈ A | g | + P m ∈ B | m | − | A || B | Frequency (support)

It is one of the most popular measures in the theory ofpattern mining. Frequency arises from the assumption that the most “interest-ing” concepts are frequent ones: supp ( A, B ) = | A || G | The support provides an eﬃcient level-wise algorithm of semilattice computingsince it exhibits antimonotonicity (a priori property [1, 37]): B ⊂ B → supp ( B ) ≥ supp ( B ) . In this study we say that a set of attributes is frequent if its support exceeds acertain threshold. So, the frequency of an attribute set means that it is frequent.

Monocle

The Monocle [47] is a method that deﬁnes concept weights based ona subset of concepts. The weight function has the following form: w ( c, H ) =  | A | + X g ∈ A N G ( g, H )  · | B | + X m ∈ B N M ( m, H ) ! , where N G ( g, H ) = |{ ( A, B ) | ( A, B ) ∈ H, g / ∈ A }| is the number of concepts in H not containing an object g , N M ( m, H ) is deﬁned similarly for attributes, H ⊂ L .The weight function is monotone w.r.t. the size of H . δ -Tolerance Closed Frequent Itemsets ( δ -TCFIs) δ -TCFIs [14] uses thesubset of lower neighbors (direct descendants) in the concept lattice to assessconcept interestingness. The main idea is to select relatively frequent conceptswith respect to their direct descendants. Concept c = ( A, B ) is δ -TCFI iﬀ itmeets the following condition: ∀ d = ( C, D ), such that | D | = | B | + 1, supp ( D ) ≥ (1 − δ ) · supp ( B ), where δ ∈ [0 ,

1] is a tolerance factor.

Margin-closed itemset

The index was proposed by Moerchen et al. [38]. Amargin closed itemset has no supersets with almost the same support, by def-inition, it satisﬁes the following expression: X ∈ F I & ∀ X ∈ F I : X ⊂ X ⇒ supp ( X ) supp ( X ) ≤ − α , where F I is a set of frequent itemsets.s can be seen from the formulas, margin-closed itemset and δ -TCFIs indicesare very close to each others. They are based on the same indices computed ondiﬀerent subsets of concepts.In our study we consider the relaxed formulation of the margin-closed itemsetindex, namely, we consider the ratio of the maximal extent size among all directpredecessors of a concept to the extent size of the concept itself.Belohlavek and Trnecka in [6, 7] investigated the group of so-called “basiclevel” measures. It is a psychology-motivated approach that was designed toformalize the existing psychological approach to deﬁning the basic level of con-cepts [39]. The group is comprised of similarity- and predictability-based indices,cue validity, category feature collocation and category utility indices. Similarity approach (S)

The similarity approach to the assessment of be-longing to a basic level was proposed in [42] and subsequently formalized andapplied to FCA in [6]. This index is the combination of three fuzzy functions thatcorrespond to formalized properties outlined by E. Rosch [42]: high cohesion ofconcepts, considerably greater cohesion with respect to upper neighbors and aslightly less cohesion with respect to lower neighbors. The membership degreeof the basic level is deﬁned as follows: BL S = coh ∗∗ ( A, B ) ⊗ coh ∗∗ un ( A, B ) ⊗ coh ∗∗ ln ( A, B ) , where coh is a fuzzy function that corresponds to the conditions deﬁned above, ⊗ is t-norm [28].A cohesion function is based on the pairwise similarity of objects from anextent. To assess similarity between two objects authors use simple matchingcoeﬃcient or Jaccard similarity: sim SMC ( B , B ) = | B ∩ B | + | M − ( B ∪ B ) || M | ; sim J ( B , B ) = | B ∩ B || B ∪ B | . A cohesion function is one of the following aggregation functions: coh a... ( A, B ) = P { x ,x }⊆ A,x = x sim ... ( x , x ) | A | ( | A | − / coh m... ( A, B ) = min x ,x ∈ A sim ... ( x , x )The Rosch’s properties for upper and lower neighbors take the following forms: coh a ∗ ...,un ( A, B ) = 1 − P c ∈ UN ( A,B ) coh ∗ ... ( c ) /coh ∗ ... ( A, B ) | U N ( A, B ) | coh a ∗ ...,ln ( A, B ) = P c ∈ LN ( A,B ) coh ∗ ... ( A, B ) /coh ∗ ... ( c ) | LN ( A, B ) | oh m ∗ ...,un ( A, B ) = 1 − max c ∈ UN ( A,B ) coh ∗ ... ( c ) /coh ∗ ... ( A, B ) coh m ∗ ...,ln ( A, B ) = min c ∈ LN ( A,B ) coh ∗ ... ( A, B ) /coh ∗ ... ( c )where U N ( A, B ) and LN ( A, B ) are upper and lower neighbors of formal concept(

A, B ) respectively.As the authors noted, experiments revealed that the type of a cohesion func-tion does not aﬀect the result, while the choice of a similarity measure can greatlyaﬀect the outcome. More than that, in some cases upper (lower) neighbors mayhave higher (lower) cohesion than the formal concept itself (for example, theboundary cases, when a neighbors’s extent / intent is comprised of identicalrows / columns). To tackle this issue of non-monotonic neighbors w.r.t. a sim-ilarity function the authors proposed to take coh ∗∗ ...,ln and coh ∗∗ ...,un as 0, if therate of non-monotonic neighbors is larger than a threshold.Below, we use the following notation: S ∗∗ SMC and S ∗∗ J , where stars are replacedby the type of cohesion functions for neighbors and objects, respectively. SM C and J stand for simple matching coeﬃcient or Jaccard similarity, respectively. Predictability approach (P)

Predictability [7] of a formal concept is com-puted in a way quite similar to the previous one. In this approach a cohesionfunction is replaced by the predictability function: P ( A, B ) = pred ∗∗ ( A, B ) ⊗ pred ∗∗ un ( A, B ) ⊗ pred ∗∗ ln ( A, B ) . From this point of view, concepts are close to basic level if there are only fewattributes outside B contained in objects from A : E ( I [ h x, y i ∈ I ] | I [ x ∈ A ]) = − | A ∩ y || A | log | A ∩ y || A | pred ( A, B ) = 1 − X y ∈ M − B E ( I [ h x, y i ∈ I ] | I [ x ∈ A ]) | M − B | . Cue Validity (CV), Category Feature Collocation (CFC), CategoryUtility (CU)

The following measures are based on the conditional probabilityof object g ∈ A given y ⊆ g [7]: CV ( A, B ) = X y ∈ B P ( A | y ) = X y ∈ B | A || y | CF C ( A, B ) = X y ∈ M p ( A | y ) p ( y | A ) = X y ∈ M | A ∩ y || y | | A ∩ y || A | CU ( A, B ) = p ( A ) X y ∈ M h p ( y | A ) − p ( y ) i = | A || G | X y ∈ M "(cid:18) | A ∩ y || y | (cid:19) − (cid:18) | y || G | (cid:19) V deals with the probability of an extent given attributes from an intent,CFC takes into account the relation between all attributes from a context andan intent of a formal concept, and CU characterizes how much an attribute froman intent is speciﬁc for a given concept rather than for a formal context [49].

In this part of the paper we propose a classiﬁcation of the existing indices forformal concepts. The classiﬁcation can be done w.r.t. several features, thus alot of diﬀerent index classiﬁcations may be suggested. FCA provides a universalframework for discovering domain structure by constructing inclusion-orderedoverlapping clusters with various degrees of generality.We represent the indices described in Section 3.2 and their features fromSection 3.1 as a formal context of size 20 ×

19. The objects and attributes of theformal context are indices and their features, respectively. The context is givenin Table 1, the corresponding lattice consists of 73 formal concepts.We applied the described indices to the lattice to discover interesting conceptsand consider top-8 concepts (about 10% of all concepts) by values of the follow-ing indices: probability, separation, monocle, margin-closed, frequency, stability,stability estimates, CV, CFC, CU, predictability, similarity approach ( S aaJ and S aaSMC ), robustness with α = 0 .

3. Below, the most frequent concept in the top-ranked groups are listed. We obtained top-ranked concepts with a singleton asan extent or an intent for margin-closed and probability, respectively. The mostinteresting groups of indices (concepts) with their frequencies (i.e., the rate oftop-8 groups where a concept has been included) are listed below.

Frequency = 0.4 :Extent: ∆ l , ∆ h , stab OIE , δ -TCFIs.Intent: for closed subsets, not applicable to arbitrary subsets, based on com-parison to other attribute subsets, based on comparison to neighboring attributesubsets, size-based, (anti)monotonic, polynomial complexity, linear complexity,computable in one-pass over data.Extent: ∆ l , ∆ h , stab NOE , stab OE , stab OIE , margin-closed*, δ -TCFIsIntent: for closed subsets, not applicable to arbitrary subsets, based on com-parison to other attribute subsets, based on comparison to neighboring attributesubsets, size-based, (anti)monotonic, polynomial complexity, linear complexity. The most interesting concepts by stability and its estimates . Frequency = 0.33 :Extent: ∆ l , ∆ h , stab NOE , stab OE , stab OIE , similarity, predictability,margin-closed*, δ -TCFIs.Intent: for closed subsetes, not applicable to arbitrary subsets, based on com-parison to other attribute subsets, based on comparison to neighboring attributesubsets, polynomial complexity. The most interesting concepts by CV and separation .able 1: A formal context of indices for formal concepts and their key features

Index name(the numberof section) a a a a a a a a a a a a a a a a a a a stability (3.2) × × × × ∆ l (3.2) × × × × × × × × × ∆ h (collapseindex)(3.2) × × × × × × × × × stab NOE (3.2) × × × × × × × × stab OE (3.2) × × × × × × × × stab OIE (3.2) × × × × × × × × × robustness (3.2) × × × × × × × probability (3.2) × × × × × × × separation (3.2) × × × × × support (3.2) × × × × × × frequency (3.2) × × × × × × × × monocle (3.2) × × × δ -TCFIs (3.2) × × × × × × × × × × × × margin-closed (3.2) × × × × × × × × × margin-closed* (3.2) × × × × × × × × × similarity (3.2) × × × × × × predictability (3.2) × × × × × × × × CV (3.2) × × × × × ×

CFC (3.2) × × × × × × ×

CU (3.2) × × × × × × × a : (designed) for closed subsets, a : not applicable to arbitrary attribute subsets, a : applicable to arbitrary attribute subsets, a : using all of the single attributesof a context, a : based on comparison to other attribute subsets, a : based oncomparison to neighboring attribute subsets, a : support-based (for subsets ofattributes), a : support-based (for single attributes), a : size-based, a : usingconditional (joint) probability, a : monotonic–antimonotonic w.r.t. order on at-tribute sets, a : using a tuning parameter, a : boolean value, a : polynomialcomplexity, a : sublinear complexity, a : linear complexity, a : quadratic com-plexity, a : cubic and higher complexity, a : computable in one-pass over dataxtent: ∆ l , ∆ h , stab NOE , stab OE , stab OIE , separation, margin-closed,margin-closed*, δ -TCFIs.Intent: for closed subsets, size-based. The most interesting concepts by frequency . Frequency = 0.25 :Extent: ∆ l , ∆ h , stab NOE , stab OE , stab OIE , robustness, margin-closed*, δ -TCFIs.Intent: for closed subsets, not applicable to arbitrary subsets, based on com-parison to other attribute subsets, based on comparison to neighboring attributesubsets, (anti)monotonic.Extent: stability, ∆ l , ∆ h , stab NOE , stab OE , stab OIE , probability, ro-bustness, separation, similarity, predictability, CV, CFC, CU, margin-closed,margin-closed*, monocle, δ -TCFIs.Intent: for closed.Extent: ∆ l , ∆ h , stab NOE , stab OE , stab OIE , robustness, similarity, pre-dictability, margin-closed*, δ -TCFIs.Intent: for closed subsets, not applicable to arbitrary subsets, based on com-parison to other attribute subsets, based on comparison to neighboring attributesubsets. The most interesting concepts by monocle .Extent: ∆ l , ∆ h , stab NOE , stab OE , stab OIE , robustness, CV, CFC, margin-closed*, δ -TCFIs.Intent: for closed subsets, (anti)monotonic. The most interesting concepts by CFC .Extent: ∆ l , ∆ h , stab NOE , stab OE , stab OIE , probability, margin-closed*, δ -TCFIs.Intent: for closed subsets, not applicable to arbitrary subsets, polynomialcomplexity, linear complexity. The most interesting concepts by CU .Extent: support, frequency, δ -TCFIs.Intent: support-based (for attribute sets), (anti)monotonic, polynomial com-plexity, computable in one-pass over data. The most interesting concepts by robustness, α = 0 . The most interesting concepts by predictability, similarity (Basic Level Metrics) .s can be seen from the results of the concept rankings, most of the selectedindices (accordingly, the rankings) include either ∆ -indices or support-basedmeasures. Both types of concepts have almost the same intent that causes thelarge number of quite similar concepts. In this section we propose guidelines for applying arbitrary itemset indices toclosed ones. It should be noted that the problem of the arbitrary itemset assess-ment was thoroughly investigated in the literature and a lot of diﬀerent indicesfor arbitrary sets of attributes and associative rules were proposed. We will notconsider in this study statistical-based approaches to patterns and rules assess-ment (the basic information on the approach can be found in [48]). Some indicesfor associative rules can be applied directly to evaluate sets of attributes, forexample, lif t given in Table 2 takes the following form for itemset assessment: lif t ( A ) = P ( A ) / Q a ∈ A P ( a ). Moreover, a rule-assessment measure may be ap-plied to attribute set within the rule-based approach. In this case, all possiblerules are generated from the examined attribute set and then an aggregationfunction is used to obtain a new index (see details in [48]).All these indices can be applied directly to formal concepts to assess themeither as subsets of objects or as subsets of attributes. However, the closeness ofattribute sets can be exploit to construct new indices. The purpose of this sectionis to provide the main ideas of how to adopt arbitrary itemset indices (includingrule-based measures) to assess both dimensions of concepts simultaneously.To develop a new index one needs to answer the following questions: – Which concepts are interesting (which patterns we are looking for)? – Which index will be chosen as the basic measure? – Which operations will be used to aggregate the values of the basic measure?The starting point is the personal perspective on concept interestingness. Wedeﬁne the following ones: – Some “internal”/“external” properties of the concept itself. In this case ho-mogeneous components are considered. – Impact of the element gathering. Here, a measure is applied to a set and itselements: f ( set ) and { f ( element ) | element ∈ set } . – Impact of a condition: f ( parameter | condition ) and f ( parameter ). – Concept pureness. This approach is based on the comparison of featurescomputed on a concept and out of the concept: f ( parameter ) and f ( U \ parameter )), where U is the domain of the parameter. – Stability to random changes in data. Resampling strategy is applied to a set: { resampling α ( set ) } , where α is a noise rate.The most popular basic measures are given in Table 2 to make the paperself-contained. More detailed information about the listed indices can be foundn [23, 3, 27]. In the case of formal concepts, indices for associative rules, can beused not only in the ways mentioned above, but also to compare a concept toothers similar concepts, i.e., to discover some relatively interesting concepts ina neighborhood.To combine values of basic measures an aggregation function is used. If a newindex measures a relative property of a concepts, then aggregation and compar-ison operators are applied in any order, i.e., compare ( aggregate el ∈ S ( el ) , d ) or aggregate el ∈ S ( compare ( el, d )), where S is a set of peer elements, d is a distin-guished value to compare.It is worth noting that both real-valued and fuzzy functions can be usedto aggregate values. If the values are in the interval [0 ,

1] (i.e., they can beconsidered as probabilistic components), a fuzzy aggregation function should beapplied. Some binary fuzzy operators ( t -norm or its dualization s -norm relatedthe equation t ( v , v ) = 1 − s (1 − v , − v )) and real-valued aggregation functionsare given in Table 3. In the case of poor data quality a robust aggregationfunction, as median, can be applied to data. A comparison function can takeone of the following forms: ∗ − ∗ , ∗ / ∗ , log ( ∗ / ∗ ). Example 2

Let us consider how the proposed guidelines can be used fordesigning a new index. We will give examples to show that the existing indicesconform to the presented approach and to demonstrate how new indices can beconstructed step-by-step using this scheme.

Cue Validity (or CV): CV ( A, B ) = X y ∈ B P ( A | y ) = X y ∈ B | A || y | . – Notion of interestingness: a concept (

A, B ) is interesting, if the objects fromits extent A are more speciﬁc for the set of attributes B . We study only asubset of attributes, i.e., properties of the concept itself. – Basic measure: conditional probability. – Aggregation operator (applied to homogeneous elements): sum.

A new index:

Index ( A, B ) = min ( C,D ) ∈ UA (( A,B )) ( P S ( D ) − P S ( B )) , where U N (( A, B ) is the set of direct ancestors of (

A, B ), and

P S ( B ) = P ( B ) − Q m ∈ B P ( m ). – Notion of interestingness: a concept (

A, B ) is interesting if it diﬀers frommore general (in the descriptive sense) concepts.

We study only direct an-cestors in a lattice. – Basic measure: Piatetsky-Shapiro. – Aggregation operator (applied to non-homogeneous elements): comparisonof a concept to each of its ancestors, aggregation by taking the minimalvalue.able 2: Indices for arbitrary itemsets (associative rules A → B ) Index Formula accuracy P ( AB ) + P ( ¬ A ¬ B )added value/change of support P ( B | A ) − P ( B )certainty factor ( P ( B | A ) − P ( B )) / (1 − P ( B ))collective strength P ( AB )+ P ( ¬ B |¬ A ) P ( A ) P ( B )+ P ( ¬ A ) P ( ¬ B ) · − P ( A ) P ( B ) P ( ¬ A ) P ( ¬ B )1 − P ( AB ) − P ( ¬ B |¬ A ) conditional probability P ( B | A )conviction P ( A ) P ( ¬ B ) /P ( A ¬ B )cosine P ( AB ) / p P ( A ) P ( B )Gini index P ( A ) (cid:0) P ( B | A ) + P ( ¬ B | A ) (cid:1) ++ P ( ¬ A ) (cid:0) P ( B |¬ A ) + P ( ¬ B |¬ A ) (cid:1) − P ( B ) − P ( ¬ B ) information gain log P ( AB ) P ( A ) P ( B ) J-measure P ( AB ) log P ( B | A ) P ( B ) + P ( A ¬ B ) log P ( ¬ B | A ) P ( ¬ B ) Jaccard P ( AB ) / ( P ( A ) + P ( B ) − P ( AB ))Klosgen p P ( AB )( P ( B | A ) P ( B )), p P ( AB ) max( P ( B | A ) P ( B ), P ( A | B ) P ( A ))Laplace correction ( N ( AB ) + 1) / ( N ( A ) + 2)least contradiction ( P ( AB ) − P ( A ¬ B )) /P ( B )leverage P ( B | A ) − P ( A ) P ( B )lift P ( AB ) / ( P ( A ) P ( B ))Loevinger 1 − P ( A ) P ( ¬ B ) /P ( A ¬ B )normalized mutual information P i P j P ( A i B j ) · log P ( A i B j ) P ( A i ) P ( B j ) / (cid:0) − P i P ( A i ) log P ( A i ) (cid:1) odd multiplier ( P ( AB ) P ( ¬ B )) / ( P ( B ) P ( A ¬ B ))Example andcounter example Rate 1 − P ( A ¬ B ) /P ( AB )odds ratio P ( AB ) P ( ¬ A ¬ B ) / ( P ( A ¬ B ) P ( ¬ BA ))one-way support P ( B | A ) log P ( AB ) P ( A ) P ( B ) Pearson’s χ | G | (cid:16) ( P ( AB ) − P ( A ) P ( B )) P ( A ) P ( B ) + ( P ( ¬ AB ) − P ( ¬ A ) P ( B )) P ( ¬ A ) P ( B ) (cid:17) ++ | G | (cid:16) ( P ( A ¬ B ) − P ( A ) P ( ¬ B )) P ( A ) P ( B ) + ( P ( ¬ A ¬ B ) − P ( ¬ A ) P ( ¬ B )) P ( ¬ A ) P ( ¬ B ) (cid:17) Piatetsky-Shapiro P ( AB ) − P ( A ) P ( B )relative risk P ( B | A ) /P ( B |¬ A )Sebag-Schoenaue P ( AB ) /P ( A ¬ B )two-way support P ( AB ) log P ( AB ) P ( A ) P ( B ) Linear Correlation Coeﬃcient P ( AB ) − P ( A ) P ( B ) √ P ( A ) P ( B ) P ( ¬ A ) P ( ¬ B ) Zhang ( P ( AB ) − P ( A ) P ( B )) / max ( P ( AB ) P ( ¬ B ) , P ( B ) P ( A ¬ B )) able 3: Aggregation functions Real-valued

Numerical Ordinal (for sortedvalues X (1) , X (2) , . . . , X ( n ) )Sum P v ∈ V v Median X ( n − , if n is odd0 . X n + X n +1 ),if n is evenArithmetic mean P v ∈ V v// | V | Maximum X ( n ) Geometric mean | V | qQ v ∈ V v Minimum X (1) Harmonic mean | V | / ( P v ∈ V v − ) Midrange 0 . X (1) + X ( n ) ) Fuzzy

T-norms S-normsDrastic product min ( v , v ),if max ( v , v ) = 10, otherwise Drastic sum max ( v , v ),if min ( v , v ) = 01, otherwiseBounded diﬀerence max (0 , v + v −

1) Bounded sum min (1 , v + v )Einstein product v v − ( v + v − v v ) Einstein sum v + v v v Algebraic product v v Probabilistic sum v + v − v v Hamacher product v v v + v − v v Hamacher sum v + v − v v − v v Minimum min ( v , v ) Maximum max ( v , v ) A new index:

Index ( A, B ) = | M \ B | X m ∈ M \ B /P ( m | B ) . – Notion of interestingness: a concept (

A, B ) is interesting if it stands out fromthe context.

We study out-of-concept elements of a context. – Basic measure: conditional probability. – Aggregation operator (applied to homogeneous elements): harmonic mean.

Application of indices in practice is aimed mostly at addressing the followinggoals: selecting a subset from the whole set of concepts or computing only afragment of a concept lattice. The ﬁrst goal then is attained by selecting the mostinteresting concepts from the set of concepts or detecting “original” conceptscomputed on a noisy context. The second goal is to reduce the computation costby constructing only the most interesting concepts.

To realize how similar the indices are, we examined the similarity of the conceptrankings by values of indices. The pairwise similarity of indices is measured byhe Kendall tau correlation coeﬃcient [26], this coeﬃcient takes into account notonly an absolute rank, but also a relative position.We randomly generated 4 groups of formal concepts with the following den-sities (the rate of the “1” in the context): 0.1, 0.2, 0.3, 0.4. Each group consistedof 100 formal contexts with the number of attributes ranged between 10 and 50,and the number of objects varying from 40 to 80.The standard deviation of within-group pairwise correlation is quite small(not more then 0.05) and does not depend on the context density ( p -values ofLevene’s test is less than 0.05). Since the studied values cannot be assumed to benormally distributed (based on D’Agostino and Pearson’s test), the Wilcoxon’stest was used to compare mean values of the Kendall tau coeﬃcient.The averaged values of the pairwise Kendall tau coeﬃcients are presented inTable 4. Among investigated indices several groups of correlated measures stoodout, but only few values of pairwise correlation are statistically stable (have thesame average value w.r.t. context densities).One of the groups of correlated indices corresponds to approximate robust-ness with diﬀerent values of parameters. It allows us to conclude that relativeimportance of concepts is mostly preserving regardless of α . Another class of in-dices that utilizes similarity and predictability approaches (Basic Level Metricsgroup) yields the second group. It should be noted that the highly correlated in-dices of this group are the measures based on similarity approach with the samecohesion function and with diﬀerent aggregation functions for sub/superordinateconcepts. The conclusion agrees with the results presented in [6].The other groups of correlated indices are highlighted in Table 4. Stability(robustness) is the most complex index for calculation, hence it is important toidentify more easily computable indices and use them instead. In our experimentswe found that logarithmic estimates of stability ( ∆ l , ∆ h ) and Max-distinguished-extents upper bound (stab OE ) are strongly correlated with stability, highlycorrelated robustness indices belong to the same complexity class as stability. Here, we consider the problem of stability approximation w.r.t. diﬀerent levelsof integral stability index, i.e., the number of levels that are used to computethe index. Since the extent size of concepts varies in the range from 0 to | G | , foreach concept we took the level of stability depending on the size of set A , to beprecise, the level is deﬁned by formula [ rate ∗ | A | ], where 0 < rate < rate we used randomly generated formal contexts (describedin the previous subsection) of diﬀerent densities: 0.1, 0.2, 0.3 and applied simplelinear regression. The integral stability index of the j th level was taken as aregressor and the stability as a dependent variable.Scatter plots of the studied indices are given in Figure 3. As we can see fromthe diagrams, taking too small rates does not allow us to estimate stability, sincefor most concepts small-sized subsets Y ⊂ A , such that Y = B do not exist.The ﬁrst local maximum of the coeﬃcients of determination for model stability = A ∗ stability Σ ratio + B able 4: The averaged Kendall tau coeﬃcient for indices. Statistically equalaverage values are in bold type. ∆ l ∆ h stab NOE stab OE stab OIE rob , rob , rob , rob , probability separation support marg-clos ∗ S aaSMC S amSMC S ma SMC S mm SMC S aaJ S amJ S ma J S mm J PCVCFC CU s t a b ili t y , , , , , , , , , , , , , , , , , , , , , - , , , , ∆ l , , , , , , , , , , , , , , , , , , , , - , , - , - , ∆ h , , , , , , , , , , - , , , , , , , , , , , , , s t a b N O E , , , , , , , , , - , , , , , , , , , , , , , s t a b O E , , , , , , , , - , , , , , , , , , , , , , s t a b O I E , , , , , , , - , , , , , , , , , , , , , r o b . , , , , , , , , , , , , , , , - , - , - , - , r o b . , , , , , , , , , , , , , , - , , , , r o b . , , , , , , , , , , , , , - , , , , r o b . , , , , , , , , , , , , - , , , , p r o b a b ili t y , , - , , , , , , , , , , - , , , s e p a r a t i o n , - , , , , , , , , , , , , , s upp o rt - , , , , , , , , , , , , , m a r g - c l o s ∗ - , - , - , - , - , - , - , - , - , - , - , - , S aa S M C , , , , , , , , , , , S a m S M C , , , , , , , , , , S m a S M C , , , , , , , , , S mm S M C , , , , , , , , S aa J , , , , , , , S a m J , , , , , , S m a J , , , , , S mm J , , , , P - , - , , C V , , C F C , ig. 3: The dependence of stability values on integral stability valuesorresponds to rate = 0 . In practice one usually faces noisy data. Even a small noise rate can result inexponential explosion of the number of formal concepts [29]. In this connection,we study the ability of indices to select original concepts from the set of conceptscomputed on a noisy context. We took 5 formal concepts, the lattices of the ﬁrstfour of them have quite simple structure (see Figure 4) and the last one is afragment of Mushroom dataset consisted of 500 objects and 14 attributes. [a] [b] [c] [d] Fig. 4: The concept lattices of formal contexts with 300 objects and 6 attributes(a - c), with 400 objects and 4 attributes (d)We added diﬀerent amount of noise into the contexts as follows: each entrychanges with given probability (noise rate). For “noisy” lattices we identiﬁedoriginal concepts and considered a binary classiﬁcation problem. We computedthe AUC (Area Under the Receiver Operator Curve) from each index separately.The averaged AUCs within groups corresponding the same datasets with diﬀer-ent noise rates and within groups with the same noise rate for the describedabove contexts are given in Figure 5.As can be seen in Figure 5, the index quality mostly depends on lattice struc-ture rather than on noise rate. Cue Validity (CV), Category Feature Collocation(CFC), Category Utility (CU) and separation have the highest AUC on par-ticular datasets, but the quality considerably changes depending on the latticestructure. For instance, AUC of CFC varies from 0.67 to 0.95. The most stableresults correspond to robustness with α ∈ { . , . , . } , the estimates of sta-bility are able to distinguish most of the original concepts (the AUC is greaterthan 0.7).Almost all indices are stable with respect to diﬀerent noise rates. The poordata quality has the strong impact on the estimates of stability: the quality ofthese indices drops down as the noise rate increases. Since these measures arebased on the elements of a lattice, the more noise is introduced, the more noisy https://archive.ics.uci.edu/ml/datasets/Mushroom ig. 5: The averaged AUC within groups with the same noise rate (dotted lines)and corresponding to the same datasets (lines a-d)concepts are involved in computation. The quality of similarity-based indices(Basic Level group) is close to random guessing, which makes them inapplicableto the analysis of noisy data.The experiments allow us to conclude that the CV, CU, CFC, separationsand robustness are the most suitable for the analysis of noisy data. In this paper we have presented some results on formal concept indices. Wehave also analyzed the existing indices for closed itemsets in Data Mining. Wedeﬁned the main features of the indices and proposed their classiﬁcation using anFCA-based approach. We have also provided some examples of most interestinggroups of indices selected by means of some of the studied indices.We have given basic ideas of adapting indices for arbitrary itemsets to closedones. We have suggested to utilize bimodal nature of the concepts to get newindices on the basis of the indices for arbitrary set of attributes.An important part of the study is devoted to practical application of indices.We have performed a comparative study of indices in the context of the fol-lowing tasks: selection of the most interesting concepts, approximation of theexponentially computable indices (stability) and ﬁltering noisy data. The re-sults of our experiments allow us to distinguish groups of correlated indices,thus these results can be used to reduce computational complexity of conceptmining by choosing easily computable indices among the correlated ones. An-other important aspect of our study is identiﬁcation of indices that can be usedfor the analysis of noisy data. It was shown that the noise ﬁltering quality ofindices depends more on the structure of the concept lattice than on the noiserate. The strongest dependence of the ﬁltering quality on noise rate correspondsto the estimates of stability. Even a small proportion of noise added to datacan signiﬁcantly change the lattice structure, which results in biased values ofneighbor-based indices.s possible directions of future work, we can propose the study of indices forapproximate concepts, e.g., biclusters, as well as indices for multimodal concepts.Another important application of indices as tools for selecting hypothesis can beexamined within the framework of classiﬁcation and learning.

Acknowledgments

This paper was prepared within the framework of the Basic Research Programat the National Research University Higher School of Economics (HSE) and sup-ported within the framework of a subsidy by the Russian Academic ExcellenceProject ’5-100’.