LinCbO: fast algorithm for computation of the Duquenne-Guigues basis
aa r X i v : . [ c s . D S ] J a n LinCbO: fast algorithm for computation of theDuquenne-Guigues basis
Radek Janostik, Jan Konecny ˚ , Petr Krajˇca Dept. Computer Science, Palack´y University Olomouc17. listopadu 12, CZ–77146 OlomoucCzech Republic
Abstract
We propose and evaluate a novel algorithm for computation of the Duquenne-Guigues basis which combines Close-by-One and LinClosure algorithms. Thiscombination enables us to reuse attribute counters used in LinClosure andspeed up the computation. Our experimental evaluation shows that it is themost efficient algorithm for computation of the Duquenne-Guigues basis.
Keywords: non-redundancy; attribute implications; minimalization;closures.
1. Introduction
Formal Concept Analysis [14, 12] (FCA) has two main outputs: (i) hier-archy of formal concepts, called a concept lattice, in the input data and (ii) anon-redundant system of attribute implications, called a basis, describing theinput data. For both of these outputs, closure systems are the fundamentalstructures behind the related theory and algorithms.Many algorithms for computing closure systems exist [12, 20]. Amongthe most efficient algorithms are variants of Kuznetsov’s Close-by-One (CbO)[18], namely Outrata & Vychodil’s FCbO [24] and Andrews’s In-Close familyof algorithms [1, 2, 3, 4, 5]. These are commonly used for enumeration offormal concepts, as both their parts, extents and intents, form a closuresystems.When considering systems of attribute implications, pseudo-intents playan important role, since they derive the minimal basis, called the Duquenne-Guigues basis or canonical basis [16]. The pseudo-intents, together with the ˚ Corresponding author
Email addresses: [email protected] (Radek Janostik), [email protected] (Jan Konecny), [email protected] (Petr Krajˇca)
Preprint submitted to Elsevier January 25, 2021 ntents of formal concepts, form a closure system. Enumerating all pseudo-intents (together with intents) is more challenging as it requires a particularrestriction of the order of the computation and the results on complexity areall but promising [19]. There are basically two main approaches for this task:NextClosure by Ganter [15, 14], and the incremental approach by Obiedkovand Duquenne [23].We present a new approach based on the CbO algorithm and LinClosure[21]. Putting it simply, we enumerate members of the closure system (in-tents and pseudo-intents) using CbO while each member is computed usingLinClosure. We show that in our approach, LinClosure is able to reuse at-tribute counters from previous computations. This makes it work very fast,as our experiments show.The rest of the paper has the following structure: First, we recall ba-sic notions of FCA (Section 2.1), closure operators (Section 2.2), bases ofattribute implications (Section 2.3), the algorithm CbO and NextClosure(Section 2.4), and the algorithms LinClosure (Section 2.5) and Wild’s clo-sure (Section 2.6) . Second, we introduce our approach, which includes CbOwith changed sweep order (Section 3.1) and improvements previously intro-duced into NextClosure in [6] (Section 3.2). Most importantly, we describea feature which enables LinClosure to reuse the attribute counters (Section3.3). Then, we experimentally evaluate the resulting algorithm (Section 4)and discuss our observations (Section 4.3). Finally, we summarize our con-clusions and present ideas for further research (Section 5).
2. Preliminaries
Here, we recall notions used in the rest of the paper.
An input to FCA is a triplet x X, Y, I y , called a formal context , where X, Y are non-empty sets of objects and attributes respectively, and I is abinary relation between X and Y . The presence of an object-attribute pair x x, y y in the relation I means that the object x has the attribute y .Finite contexts are usually depicted as tables, in which rows representobjects in X , columns represent attributes in Y , ones in its entries meanthat the corresponding object-attribute pair is in I .The formal context x X, Y, I y induces so-called concept-forming operators : LinClosure is an algorithm for computation of the smallest model of a theory contain-ing a given set of attributes. It uses so-called attribute counters to avoid set comparisonsand reach a linear time complexity. We recall this in Section 2.5. : X Ñ Y assigns to a set A of objects the set A Ò of all attributesshared by all the objects in A . Ó : Y Ñ X assigns to a set B of attributes the set B Ó of all objectswhich share all the attributes in B .Formally, for all A Ď X, B Ď Y we have A Ò “ t y P Y | @ x P A : x x, y y P I u ,B Ó “ t x P X | @ y P B : x x, y y P I u . Fixed points of the concept-forming operators, i.e. pairs x A, B y P X ˆ Y satisfying A Ò “ B and B Ó “ A , are called formal concepts . The sets A and B in a formal concept x A, B y are called the extent and the intent , respectively.The set of all intents in x X, Y, I y is denoted by Int p X, Y, I q .An attribute implication is an expression of the form L ñ R where L, R Ď Y are sets of attributes.We say that L ñ R is valid in a set of attributes M Ď Y if L Ď M implies R Ď M. The fact that L ñ R is valid in M is written as } L ñ R } M “ L ñ R is valid in a context x X, Y, I y if it is valid in everyobject intent t x u Ò , i.e. } L ñ R } t x u Ò “ @ x P X. A set of attribute implications is called a theory .A set of attributes M is called a model of theory T if every attributeimplication in T is valid in M . The set of all models of T is denoted Mod p T q ,i.e. Mod p T q “ t M | @ L ñ R P T : } L ñ R } M “ u . A closure system in a set Y is any system S of subsets of Y which contains Y and is closed under arbitrary intersections.A closure operator on a set Y is a mapping c : 2 Y Ñ Y satisfying foreach A, A , A Ď Y : A Ď c p A q (1) A Ď A implies c p A q Ď c p A q (2) c p A q “ c p c p A qq . (3)3he closure systems and closure operators are in one-to-one correspondence.Specifically, for a closure system S in Y , the mapping c S : 2 Y Ñ Y definedby c S p A q “ č t B P S | A Ď B u is a closure operator. Conversely, for a closure operator c on Y , the set S c “ t A P Y | c p A q “ A u is a closure system. Furthermore, S c S “ S and c S c “ c .For a formal context x X, Y, I y , the set Int p X, Y, I q of its intents is a clo-sure system. The corresponding closure operator, c Int p X,Y,I q , is equal to thecomposition ÓÒ of concept-forming operators.For any theory T , the set Mod p T q of its models is a closure system. Thecorresponding closure operator, c Mod p T q , is equal to the following operator c T . For Z Ď Y and theory T , put1. Z T “ Z Y Ť t R | L ñ R P T, L Ď Z u , Z T “ Z, Z T n “ p Z T n ´ q T . Define operator c T : 2 Y Ñ Y by c T p Z q “ ď n “ Z T n . A theory T is called • complete in x X, Y, I y if Mod p T q “ Int p X, Y, I q ; • a basis of x X, Y, I y if no proper subset of T is complete in x X, Y, I y .A set P Ď Y of attributes is called a pseudo-intent if it satisfies thefollowing conditions:(i) it is not an intent, i.e. P ÓÒ ‰ P ;(ii) for all smaller pseudo-intents P Ă P , we have P ÓÒ Ă P . Theorem 1.
Let P be a set of all pseudo-intents of x X, Y, I y . The set t P ñ P ÓÒ | P P P u is a basis of x X, Y, I y . Additionally, it is a minimal basis in terms of thenumber of attribute implications. Duquenne-Guigues basis .Let P be a set of all pseudo-intents of x X, Y, I y . The union Int p X, Y, I qY P is a closure system on Y .The corresponding closure operator ˜ c T is given as follows. For Z Ď Y and theory T , put1. Z T “ Z Y Ť t R | L ñ R P T , L Ă Z u , Z T “ Z, Z T n “ p Z T n ´ q T . Define operator ˜ c T : 2 Y Ñ Y by˜ c T p Z q “ ď n “ Z T n . (4)The algorithm which follows the above definition is called the na¨ıve algo-rithm. There are more sophisticated ways to compute closures, like LinClo-sure [21], Wild’s closure [26], and SL -closure [22].Note that the definition of ˜ c T differs from the definition of c T in Section 2.2only in the subsethood in item 1 – the operator c T allows equality in thisitem while ˜ c T does not. In what follows, we use the shortcut Z ‚ for ˜ c T p Z q .Let Z be a set of attributes and S be a subset of attribute implicationssuch that • all implications L ñ R P T with L Ă Z ‚ are in S , • no attribute implication L ñ R P T with L “ Z ‚ is in S . (5)Then, we clearly have, c S p Z q “ Z ‚ .This gives a basic picture, how we compute the Duquenne-Guigues basis T : starting with S “ H , we compute c S p Z q for a set Z for which S satisfiesthe conditions (5). If Z ‚ is a pseudo-intent, we update S by adding theattribute implication Z ‚ ñ Z ‚ÓÒ , and repeat for other sets Z . When allplausible sets are processed, S is the Duquenne-Guigues basis T .Therefore, the intents and pseudo-intents must be enumerated in an order ď which extends the subsethood; i.e. C Ď C implies C ď C for all C , C P Int p X, Y, I q Y P . (6)NextClosure enumerates closed sets in so-called lectic order . We obtain thelectic order of sets when we order their characteristic vectors as binary num-bers. The lectic order satisfies (6); that is why NextClosure [14] (describedat the end of Section 2.4) is most frequently used for the computation of theDuquenne-Guigues basis. 5 .4. Close-by-One and NextClosure We assume a closure operator c on set Y “ t , , . . . , n u . Whenever wewrite about lower attributes or higher attributes, we refer to the naturalordering of the numbers in Y .We start the description of CbO with a basic algorithm for generatingall closed sets (Algorithm 1). The basic algorithm traverses the space ofall subsets of Y , each subset is checked for closedness and is outputted.This approach is quite inefficient as the number of closed subsets is typicallysignificantly smaller than the number of all subsets. Algorithm 1:
Basic algorithm to enumerate closed subsets def
GenerateFrom( B , y ) :input : B – set of attributes y – last added attribute if B “ c p B q then print ( B ) for i P t y ` , . . . , n u do D Ð B Y t i u GenerateFrom( D , i ) return GenerateFrom( H , ) The algorithm is given by a recursive procedure
GenerateFrom , whichaccepts two arguments: ‚ B – the set of attributes, from which new sets will be generated. ‚ y – the auxiliary argument to remember the highest attribute in B .The procedure first checks the input set B for closedness and prints it if it isclosed (lines 1,2). Then, for each attribute i higher than y : ‚ a new set is generated by adding the attribute i into the set B (line 4); ‚ the procedure recursively calls itself to process the new set (line 5).The procedure is initially called with an empty set and zero as its arguments.The basic algorithm represents a depth-first sweep through the tree of allsubsets of Y (see Fig. 1) and printing the closed ones.6
12 10 14 161511 13123 7 984 65
Figure 1: Tree of all subsets of t , , , u . Each node represents a unique set containingall elements in the path from the node to the root. The dotted arrows and small numbersrepresent the sweep performed by the CbO algorithm. In the tree of all subsets (Fig. 1), each node is a superset of its predeces-sors. We can use the closure operator ÓÒ to skip non-closed sets. In otherwords, to make jumps in the tree to closed sets only. CbO can be seen as thebasic algorithm with closure jumps: instead of simply adding an element togenerate a new subset D Ð B Y t i u , CbO adds the element and then closes the set D Ð c p B Y t i uq . (7)We need to distinguish the two outcomes of the closure (7). Either • the closure contains some attributes lower than i which are not includedin B , i.e. D i ‰ B i where D i “ D X t , . . . , i ´ u , B i “ B X t , . . . , i ´ u ; • or it does not, and we have D i “ B i . The jumps with D i ‰ B i are not desirable because they land on a closedset which was already processed or will be processed later (depending on thedirection of the sweep). CbO does not perform such jumps. The check ofthe condition D i “ B i is called a canonicity test .One can see the pseudocode of CbO in Algorithm 2.We describe the differences from the basic algorithm:7 lgorithm 2: Close-by-One def
CbOStep( B , y ) :input : B – closed set y – last added attribute print ( B ) for i P t y ` , . . . , n u z B do D Ð c p B Y t i uq if D i “ B i then CbOStep( D , i )CbOStep( c pHq , ) • The argument B is a closed set, therefore, the procedure GenerateFrom can print it directly without testing (line 1). • In the loop, we skip elements already present in B (line 2). • The recursive invocation is made only if the new closed set D passesthe canonicity test (lines 3,4). • The initial invocation is made with the smallest closed set c pHq insteadof the empty set.The algorithm NextClosure [14] is another algorithm for enumeratingclosed sets.NextClosure is represented by the procedure NextClosure (Algorithm 3)which accepts a closed set B and returns another closed set, which is thelectic successor of the input set.It starts with a set B containing all attributes from B . It processesattributes in Y in descending order (line 2).1. If the processed attribute is in B , it removes it (lines 3,4);2. If the processed attribute is not in B , it computes the closure D of B Y t i u (lines 5,6);Note that the above effectively increases the binary number correspondingto the characteristic vector of B by one and closes it; this corresponds to thedescription of the lectic order via binary numbers. Then, the set D is testedfor canonicity the same way as in CbO. If D passes the test, it is returned as8he result (line 7). Otherwise, we continue processing the other attributes.If we exhaust all attributes, we return Y as the lectically last closed set.To enumerate all formal concepts, the NextClosure algorithm starts withthe least closed set c pHq and in consecutive steps applies this procedure toobtain the next formal concepts. The algorithm stops if Y is obtained. Algorithm 3:
NextClosure def
NextClosure( B ) :input : B – set of attributes B Ð B for all i P Y (in descending order) do if i P B then B Ð B zt i u else D Ð c p B Y t i uq if B i “ D i then return D return Y NextClosure can be seen as an iterative version of CbO with the rightdepth-first sweep through the tree of all subsets. From this point of view,the above item 1. is equivalent to backtracking in the tree of all subsets, anditem 2. is CbO’s adding and closing. The consequent test of canonicity isthe same as in CbO.
LinClosure (Algorithm 4) [7, 21] accepts a set B of attributes for whichit computes the T -closure c T p B q . The theory T is considered to be a globalvariable. It starts with a set D containing all elements of B (line 1). If thereis an attribute implication in T with an empty left side, the D is united withits right side (lines 2,3). LinClosure associates a counter count r L ñ R s witheach L ñ R P T initializing it with the size | L | of its left side (lines 4,5). Also,each attribute y P Y is linked to a list of the attribute implications that have y in their left sides (lines 6,7). Then, the set Z of attributes to be processed isinitialized as a copy of the set D (line 8). While there are attributes in Z , thealgorithm chooses one of them (min in the pseudocode, line 10), removes itfrom Z (line 11) and decrements counters of all attribute implication linked This needs to be done just once and it is usually done outside the LinClosure procedure.
9o it (lines 12,13). If the counter of any attribute implication L ñ R isdecreased to 0, new attributes from R are added to D and to Z . Algorithm 4:
LinClosure def
LinClosure( B ) :input : B – set of attributes D Ð B if DH ñ R P T for some R then D Ð D Y R for all L ñ R P T do count r L ñ R s Ð | L | for all a P L do add L ñ R to list r a s Z Ð D while Z ‰ H do m Ð min p Z q Z Ð Z zt m u for all L ñ R P list r m s do count r L ñ R s Ð count r L ñ R s ´ if count r L ñ R s “ then add Ð R z D D Ð D Y add Z Ð Z Y add return D We are going to use the algorithm LinClosure in CbO. CbO drops theresulting closed set if it fails the canonicity test (Algorithm 2, lines 4,5).Therefore, we can introduce a feature – early stop – which stops the compu-tation whenever an attribute which would cause the fail is added into the set.To do that, we add a new input argument, y , having the same role as in CbO;i.e. the last attribute added into the set (Algorithm 5). Then, whenever newattributes are added to the set, we check whether any of them is lower than y . If so, we stop the procedure and return information that the canonicitytest would fail (lines 16–17). This feature is also utilized in [6].
10n the pseudocode of LinClosure with an early stop (Algorithm 5), we alsoremoved the two lines which handled the case for the attribute implicationin T with an empty left side (Algorithm 4, lines 2,3). In Section 3.2, weintroduce an improvement for CbO which makes the two lines superfluous. Algorithm 5:
LinClosure with an early stop def
LinClosureES( B , y ) :input : B – set of attributes y – last attribute added to B D Ð B if DH ñ R P T for some R then D Ð D Y R for all L ñ R P T do count r L ñ R s Ð | L | for all a P L do add L ñ R to list r a s Z Ð D while Z ‰ H do m Ð min p Z q Z Ð Z zt m u for all L ñ R P list r m s do count r L ñ R s Ð count r L ñ R s ´ if count r L ñ R s “ then add Ð R z D if min p add q ă y then return fail else D Ð D Y add Z Ð Z Y add return D For the sake of completeness, we also describe Wild’s closure [26]. Ouralgorithm does not use this closure; however, algorithms NC3 and NC ` B of attributes for which itcomputes the T -closure c T p B q . The theory T is considered to be a globalvariable.It starts with a set D containing all elements of B (line 1). First, ithandles the case for attribute implication with an empty left side, the sameway that LinClosure does (lines 2,3). Wild’s closure maintains implicationlists, similarly to LinClosure (lines 4-6). It keeps a set N of current attributeimplications, initially equal to T (line 7). It uses the attribute lists to finda subset N Ď N of implications whose left-hand side has an attribute notoccurring in D (line 10). It uses the rest N z N of implications to extend D . If D is extended, the process is repeated for N being the set of currentimplications (loop at lines 8-15). Otherwise D is the resulting set and isreturned (line 16). Algorithm 6:
Wild’s closure def
WildClosure( B ) :input : B – set of attributes D Ð B if DH ñ R P T for some R then D Ð D Y R for all L ñ R P T do for a P L do add L ñ R to list r a s N Ð T repeat stable Ð true N Ð Ť a R D list r a s for all L ñ R P N z N do D Ð D Y R stable Ð f alse ; N Ð N until stable return D
19 5 3 247 6813 11 101215 1416
Figure 2: Tree of all subsets of t , , , u . Each node represents a unique set containingall elements in the path from the node to the root. The dotted arrows and small numbersrepresent the sweep performed by the CbO algorithm with right depth-first sweep.
3. LinCbO: CbO-based algorithm for computation of the Duquenne-Guigues basis
In this section, we describe the algorithm LinCbO. Its foundation is CbO(Algorithm 2) with LinClosure (Algorithm 4). We explain changes in theCbO algorithm: a change of sweep order makes the algorithms work, andthe rest of the changes improve efficiency of the algorithms.
In the previous section, we presented CbO as the left first sweep throughthe tree of all subsets. This is how it is usually described. In ordinarysettings, there is no need to follow a particular order of sweep. However,our aim is to compute intents and pseudo-intents using the closure operator˜ c T (4), or more exactly, closure operator c S for S Ď T satisfying (5). Forthis, we need to utilize an order which extends the subsethood, i.e. (6). Theright depth-first sweep through the tree of all subsets satisfies this condition(see Fig. 2). Observe that with the right depth-first sweep, we obtain exactlythe lectic order, i.e. the same order in which NextClosure explores the searchspace. The following improvements were introduced to NextClosure [6] and theincremental approach [23] for computation of pseudo-intents. We incorpo-rated them to the CbO algorithm.After the algorithm computes B ‚ , the implication B ‚ Ñ B ÓÒ is added to T , provided B ‚ is a pseudo-intent, i.e. B ‚ ‰ B ÓÒ .13ote that there exists the smallest ˜ c T -closed set larger than B ‚ and it isthe intent B ‚ÓÒ ( “ B ÓÒ ). Consider the following two cases:(o1) This intent satisfies the canonicity test, i.e. p B ÓÒ q y “ p B ‚ q y , where y isthe last added attribute to B . Then we can jump to this intent.(o2) This intent does not satisfy the canonicity test. Thus, we can leave thepresent subtree.Now, let us describe the first version of LinCbO (Algorithm 7), whichincludes the above discussed improvements. The procedure
LinCbO1Step works with the following global variables: aninitially empty theory T and an initially empty list of attribute implicationsfor each attribute. LinCbO1Step accepts two arguments: a set B of attributesand the last attribute y added to B . The set B is not generally closed (whichwas the case in Algorithm 2).The procedure first applies LinClosure with an early stop (Algorithm 5) tocompute B ‚ (line 1). If B ‚ fails the canonicity test (recall that the canonicitytest is incorporated in LinClosure with an early stop), the procedure stops(lines 2,3). Then, the procedure computes B ‚ÓÒ to check whether B ‚ is anintent or pseudo-intent (line 4). If it is a pseudo-intent, a new attributeimplication B ‚ ñ B ‚ÓÒ is added to the initially empty theory T (line 5).For each attribute in B ‚ , we update its list by adding the new attributeimplication (lines 6 and 7).Now, as we computed the intent B ‚ÓÒ , we can apply (o1) or (o2) basedon the result of the canonicity test p B ‚ÓÒ q y “ p B ‚ q y (line 8) – either we call LinCbO1Step for B ‚ÓÒ (line 9) or end the procedure. If B ‚ is an intent, werecursively call LinCbO1Step for all sets B ‚ Y t i u where i is higher than thelast added attribute y and is not already present in B ‚ . To have lectic order,we make the recursive calls in the descending order of i s.The procedure LinCbO1Step is initially called with empty set of attributesand zero representing an invalid last added attribute.Now we can explain why we removed the part of the code of LinClosurewhich handles the case
H ñ R P T (Algorithm 4, lines 2,3) from LinClo-sure with an early stop. The presence of H ñ R in T means that H is apseudo-intent. This pseudo-intent is generated by the initial invocation of LinCbO1Step . Since for the initial invocation, we have y “
0, the intent As CbO with right depth-first sweep can be considered a recursive NextClosure, thisversion of LinCbO can be considered a recursive version of the corresponding algorithmfrom [6] (denoted NC ` lgorithm 7: LinCbO1 (CbO for the Duquenne-Guigues basis, firstversion) T Ð H list r i s Ð H for each i P Y def LinCbO1Step( B , y ) :input : B – set of attributes y – last attribute added to B B ‚ Ð LinClosureES p B, y q if B ‚ is fail then return if B ‚ ‰ B ‚ÓÒ then T Ð T Y t B ‚ ñ B ‚ÓÒ u for i P B ‚ do list r i s Ð list r i s Y t B ‚ ñ B ‚ÓÒ u if p B ‚ÓÒ q y “ p B ‚ q y then LinCbO1Step( B ‚ÓÒ , y ) else for i from n down to y ` , i R B ‚ do LinCbO1Step( B ‚ Y t i u , i )LinCbO1Step( H , ) ‚ÓÒ “ H ÓÒ “ R trivially satisfies the condition R y “ H y (Algorithm 7, line8) and LinCbO1Step is invoked with this intent (Algorithm 7, line 9). Con-sequently, all the processed sets are supersets of R , and therefore the unionwith R (Algorithm 4, line 3) does nothing. Consider theory T and theory T which emerges by adding new attributeimplications to T , i.e. T Ď T . When we compute T -closure B , we canstore values of the attribute counters at the end of the LinClosure procedure.Later, when we compute T -closure of a superset B of B , we can initializethe attribute counters of implications from T to the stored values instead ofthe antecedent sizes. Attribute counters for new implications, i.e. those in T z T , are initialized the usual way. Then, we handle only the new attributes,that is those in B z B .We can improve the LinClosure accordingly (Algorithm 8). We describeonly the differences from LinClosure with an early stop (Algorithm 5). Itaccepts two additional arguments: Z – the set of new attributes, i.e, thosewhich were not in the T -closed subset from which we reuse the counters;and prevCount – the previous counters to be reused. We copy the previouscounters and new attributes Z to local variables (lines 2,3). Furthermore,we add new attribute implications (lines 4,5).Note, that in CbO we always make the recursive invocations for supersetsof the current set (see Algorithm 7, lines 9 and 12). Therefore, we can easilyutilize the LinClosure with reused counters in LinCbO (Algorithm 9). Theonly difference from the first version (Algorithm 7) is that the procedure LinCbOStep accepts two additional arguments, which are passed to procedure
LinClosureRC (line 1). The two arguments are: the set of new attributesand the previous attribute counters (both initially empty). Recall that theattribute counters are modified by LinClosure. The corresponding argumentsare also passed to the recursive invocations of
LinCbOStep (lines 9 and 12).
4. Experimental Comparison
We compare LinCbO with other algorithms, namely: • NextClosure with na¨ıve closure (NC1), LinClosure (NC2), and Wild’sclosure (NC3). • NextClosure ` , which is NextClosure with the improvements describedin Section 3.2, with the same closures (NC `
1, NC `
2, NC ` ; NextClosure and NextClosure ` are called Ganter and Ganter ` in [6]. lgorithm 8: LinClosure with reused counters def
LinClosureRC( B , y , Z , prevCount ) :input : B – set of attributes to be closed y – last attribute added to BZ – set of new attributes prevCount – previous attribute counters fromcomputation B z Z D Ð B count Ð copy of prevCount Z Ð Z for L ñ R P T not counted in prevCount do count r L ñ R s Ð | L z B | while Z ‰ H do m Ð min p Z q Z Ð Z zt m u for L ñ R P list r m s do count r L ñ R s Ð count r L ñ R s ´ if count r L ñ R s “ then add Ð R z D if min p add q ă y then return fail D Ð D Y add Z Ð Z Y add return x D, count y lgorithm 9: LinCbO (CbO for the Duquenne-Guigues basis, finalversion) T Ð H list r i s Ð H for each y P Y def LinCbOStep( B , y , Z , prevCount ) :input : B – set of attributes y – last attribute added to BZ – set of new attributes prevCount – attribute counters x B ‚ , count y Ð LinClosureRC p B, y, Z, prevCount q if B ‚ is fail then return if B ‚ ‰ B ‚ÓÒ then T Ð T Y t B ‚ ñ B ‚ÓÒ u for i P B ‚ do list r i s Ð list r i s Y t B ‚ ñ B ‚ÓÒ u if p B ‚ÓÒ q y “ p B ‚ q y then LinCbOStep( B ‚ÓÒ , y, B ‚ÓÒ z B ‚ , count ) else for i from n down to y ` , i R B ‚ do LinCbOStep( B ‚ Y t i u , i, t i u , count )LinCbOStep( H , , H , H ) attribute incremental approach [23].To achieve maximal fairness, we implemented LinCbO into the frameworkmade by Bazhanov & Obiedkov [6] . It contains implementations of all thelisted algorithms. In Section 4.1, we also use the same datasets as used byBazhanov and Obiedkov [6].All experiments have been performed on a computer with 64 GB RAM,two Intel Xeon CPU E5-2680 v2 (at 2.80 GHz), Debian Linux 10, and GNUGCC 8.3.0. All measurements have been taken ten times and the mean valueis presented. Bazhanov and Obiedkov [6] use artificial datasets and datasets from UCIrvine Machine Learning Repository [13].The artificial datasets are named as | X | x | Y | - d , where d is the numberof attributes of each object; i.e. |t x u Ò | “ d for each x P X . The attributesare assigned to objects randomly, with exception , where each ob-ject misses a different attribute (more exactly, the incidence relation is theinequality).The datasets from UC Irvine Machine Learning Repository are: Breast-cancer , Breast-w , dbdata0 , flare , Post-operative , spect , vote , and zoo . See Ta-ble 1 for properties of all the datasets.In batch 1, LinCbO computes the basis faster than the rest of algorithms;however in most cases the runtimes are very small and differences betweenthem are negligible (see Table 2). As the runtimes in batch 1 often differ only in a few milliseconds, wetested the algorithm on larger datasets. We used the following datasets fromUC Irvine Machine Learning Repository [13]: • crx – Credit Approval (37 rows containing a missing value were re-moved), • shuttle – Shuttle Landing Control, • magic – MAGIC Gamma Telescope, • bikesharing (day|hour) – Bike Sharing Dataset, Available at https://github.com/yazevnul/fcai able 1: Properties of the datasets in batch 1 dataset | X | | Y | | I |
100 30 400 307 557
100 50 400 251 1115
10 100 250 129 380
10 100 500 559 546
18 18 306 262,144 0
20 100 500 716 2269
20 100 1000 12,394 8136
50 100 500 420 3893
900 100 3600 2472 7994
Breast-cancer
286 43 2851 9918 3354
Breast-w
699 91 6974 9824 10,666 dbdata0
298 88 1833 2692 1920 flare
Post-operative
90 26 807 2378 619 spect
267 23 2042 21,550 2169 vote
435 18 3856 10,644 849 zoo
101 28 862 379 141 • kegg – KEGG Metabolic Reaction Network – Undirected.We binarized the datasets using nominal ( nom ), ordinal ( ord ), and interor-dinal ( inter ) scaling, where each numerical feature was scaled to k attributeswith k ´ p scaling q k p dataset q . For example, inter10shuttle is the dataset ‘Shuttle Landing Control’ interordinally scaled to 10, using 9equidistant cutpoints.For this batch, we included LinCbO1 (Algorithm 7) to show how the reuseof attribute counters influences the performance.For most datasets, LinCbO works faster than the other algorithms. Forthe remaining datasets, LinCbO is the second best after the attribute in-cremental approach (see Table 4). However, we encountered limits of theattribute incremental approach as it runs out of available memory in threecases (denoted by the symbol ˚ in Table 4).20 able 2: Runtimes in seconds of algorithms generating Duquenne-Guigues basis in batch 1. Dataset AttInc NC1 NC2 NC3 NC ` ` ` Breast-cancer
Breast-w dbdata0 flare
Post-operative spect vote zoo able 3: Properties of the datasets in batch 2 dataset | X | | Y | | I | inter10crx
653 139 40,170 10,199,818 20,108 inter10shuttle inter3magic inter4magic inter5bike day
731 93 24,650 3,023,326 20,425 inter5crx
653 79 20,543 348,428 3427 inter5shuttle inter6shuttle nom10bike day
731 100 9293 52,697 29,773 nom10crx
653 85 8774 51,078 6240 nom10magic nom10shuttle nom15magic nom20magic nom5bike day
731 65 9293 61,853 16,296 nom5bike hour nom5crx
653 55 8774 29,697 2162 nom5keg nom5shuttle ord10bike day
731 93 28,333 664,713 11,795 ord10crx
653 79 37,005 1,547,971 2906 ord10shuttle ord5bike day
731 58 14,929 81,277 5202 ord5bike hour ord5crx
653 49 19,440 139,752 973 ord5magic ord5shuttle ord6magic able 4: Runtimes in seconds of algorithms generating Duquenne-Guigues basis in batch 2. The symbol ˚ means that the run could notbe completed due to insufficient memory Dataset AttInc NC1 NC2 NC3 NC ` ` ` inter10crx inter10shuttle ˚ inter3magic inter4magic ˚ inter5bike day inter5crx inter5shuttle inter6shuttle nom10bike day nom10crx nom10magic nom10shuttle nom15magic nom20magic nom5bike day nom5bike hour nom5crx nom5keg ˚ nom5shuttle ord10bike day ord10crx ord10shuttle ord5bike day ord5bike hour ord5crx ord5magic ord5shuttle ord6magic ataset mushroom anonymous web adult internet adssize 8124 ˆ
119 32,711 ˆ
296 48,842 ˆ
104 3279 ˆ Table 5: Runtimes of formal concept enumeration by NextClosure and CbO in seconds forselected datasets (source: [24])
Based on the experimental evaluation in Section 4, we conclude thatLinCbO is the fastest algorithm for computation of the Duquenne-Guiguesbasis. In some cases, it is outperformed by the attribute incremental ap-proach. However, the attribute incremental approach seems to have enor-mous memory requirements as it run out of memory for several datasets.Originally, we believed that CbO itself can make the computation faster.This motivation came from the paper by Outrata & Vychodil [24], where CbOis shown to be significantly faster than NextClosure when computing intents(see Table 5). The main reason for the speed-up is the fact that CbO uses setintersection to efficiently obtain extents during the tree descent. This featurecannot be exploited for computation of the Duquenne-Guigues basis. TheCbO itself rarely seems to have a significant effect on the runtime – this wasthe case for datasets nom10shutle and nom5shutle . Sometimes, it lead toworse performance, for example for datasets inter10crx , inter10shuttle ,and nom20magic .However, the introduction of the reuse of attribute counters significantlyimproves the runtime for most datasets (see Fig. 3).
5. Conclusions and further research
The algorithm LinClosure has been considered to be slow and even worsethan the na¨ıve closure [26, 6]. In an experimental evaluation, we have shownthat it can perform very fast when it can reuse its attribute counters. Thereuse is enabled by using CbO.As our future research, we want to further develop the present algorithm. • One of the benefits of CbO is that it can be improved to avoid someunnecessary closure computations. This improvement, called pruning,is in various ways utilized in FCbO [24] and In-Close ver. 3 and higher24 nter10crxinter10shuttleinter3magicinter4magicinter5bikeDdayinter5crxinter5shuttleinter6shuttle nom10bike daynom10crxnom10magicnom10shuttlenom15magicnom20magicnom5bike daynom5bike hournom5crxnom5kegnom5shuttle ord10bike dayord10crxord10shuttleord5bike dayord5bike hourord5crxord5magicord5shuttleord6magic NC ` Figure 3: Comparison of NextClosure with LinClosure with an early stop (NC ` • Generalization of LinClosure is used to compute models in generalizedsettings, like fuzzy attribute implications [8, 10, 11] and temporal at-tribute implications [25]. We will explore potential uses of LinCbO inthese generalizations. • Algorithms for enumeration of closed sets can be extended to handle abackground knowledge given as a set of attribute implications or as aconstraint closure operator [9]. Adding the background knowledge inthe computation of the Duquenne-Guigues basis was investigated byKriegel [17]. We will explore this possibility for LinCbO. • The implementation used for experimental evaluation was made to beat a similar level to the Bazhanov and Obiedkov implementations [6].We will deliver an optimized implementation of LinCbO, possibly witha pruning technique.
Acknowledgment
The authors acknowledge support by the grants • IGA UP 2020 of Palack´y University Olomouc, No. IGA PrF 2020 019, • JG 2019 of Palack´y University Olomouc, No. JG 2019 008.