Finding Robust Itemsets Under Subsampling
AAFinding Robust Itemsets Under Subsampling
NIKOLAJ TATTI , HIIT, Aalto University, KU Leuven, University of Antwerp
FABIAN MOERCHEN , Amazon
TOON CALDERS , Université Libre de Bruxellés, Eindhoven University of Technology
Mining frequent patterns is plagued by the problem of pattern explosion making pattern reduction tech-niques a key challenge in pattern mining. In this paper we propose a novel theoretical framework for patternreduction. We do this by measuring the robustness of a property of an itemset such as closedness or non-derivability. The robustness of a property is the probability that this property holds on random subsets ofthe original data. We study four properties: if an itemset is closed, free, non-derivable or totally shattered,and demonstrate how to compute the robustness analytically without actually sampling the data. Our con-cept of robustness has many advantages: Unlike statistical approaches for reducing patterns, we do notassume a null hypothesis or any noise model and in contrast to noise tolerant or approximate patterns, therobust patterns for a given property are always a subset of the patterns with this property. If the underlyingproperty is monotonic, then the measure is also monotonic, allowing us to efficiently mine robust itemsets.We further derive a parameter-free technique for ranking itemsets that can be used for top- k approaches.Our experiments demonstrate that we can successfully use the robustness measure to reduce the number ofpatterns and that ranking yields interesting itemsets. Categories and Subject Descriptors: H.2.8 [
Database Management ]: Data MiningGeneral Terms: Algorithms, Experimentation, TheoryAdditional Key Words and Phrases: pattern reduction, robust itemsets, closed itemsets, free itemsets, non-derivable itemsets, totally shattered itemsets
ACM Reference Format:
ACM Trans. Datab. Syst. V, N, Article A (January YYYY), 27 pages.DOI =
1. INTRODUCTION
Frequent itemset mining was first introduced in the context of market basket anal-ysis [Agrawal et al. 1993]. This problem can be defined as follows: a transaction is asubset of a given set of items A , and a transaction database is a set of such transac-tions. A subset X of A is a frequent itemset in a transaction database if the number oftransactions containing all items of X exceeds a given threshold. Since its proposal, The research described in this paper builds upon and extends the work presented in the IEEE InternationalConference on Data Mining (ICDM IEEE 2011) [Tatti and Moerchen 2011].Part of this work done while Nikolaj Tatti was employed by ADReM Research Group, Department ofMathematics and Computer Science, University of Antwerp and DTAI group, Department of ComputerScience, Katholieke Universiteit Leuven, Leuven, Belgium. In addition, Fabian Moerchen was employed bySiemens Corporate Research, USA and Toon Calders was employed by Faculty of Mathematics and ComputerScience, Eindhoven University of Technology, The Netherlands. Nikolaj Tatti was partly supported by aPost-Doctoral Fellowship of the Research Foundation — Flanders (FWO).Authors’ address: N. Tatti, Helsinki Insitute for Information Technology, Department of Information andComputer Science, Aalto University, Finland. F. Moerchen, Amazon, Seattle, Washington, USA. T. Calders,WIT group, Computer & Decision Engineering department, Université Libre de Bruxellés, BelgiumPermission to make digital or hard copies of part or all of this work for personal or classroom use is grantedwithout fee provided that copies are not made or distributed for profit or commercial advantage and thatcopies show this notice on the first page or initial screen of a display along with the full citation. Copyrightsfor components of this work owned by others than ACM must be honored. Abstracting with credit is permitted.To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of thiswork in other works requires prior specific permission and/or a fee. Permissions may be requested fromPublications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax + (cid:13) YYYY ACM 0362-5915/YYYY/01-ARTA $10.00DOI 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000
ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. a r X i v : . [ c s . D B ] A p r :2 frequent itemset mining has been used to address many data mining problems suchas association rule generation [Hipp et al. 2000], clustering [Wang et al. 1999], classi-fication [Cheng et al. 2007], temporal data mining [Moerchen et al. 2010] and outlierdetection [Smets and Vreeken 2011]. The mining of itemsets is a core step in thesemethods that often dominates the overall complexity of the problem. The number offrequent itemsets, however, can be extremely large even for moderately sized datasets;in worst case, the number of frequent itemsets is exponential in | A | . This explosionseverely complicates manual analysis or further automated processing steps.Therefore, researchers have proposed many solutions to reduce the number of pat-terns depending on the context in which the patterns are to be used or the process inwhich the data was generated. Example of reduced pattern collections include: the closeditemsets [Pasquier et al. 1999] to avoid redundant association rules, constrained item-sets [Pei et al. 2001] to incorporate prior knowledge, condensed representations [Calderset al. 2006] to answer frequency queries with limited memory, margin-closed item-sets [Moerchen et al. 2010] for exploratory analysis, and surprising itemsets [Brin et al.1997; Tatti 2008] or top-k patterns [Geerts et al. 2004] for itemset ranking.Many of reduction techniques have a drawback of being fragile. For example, aclosed itemset can be defined as an itemset that can be written as the intersectionof transactions; that is, all of its supersets are contained in strictly less transactions.Given a non-closed itemset X , adding a single transaction to the dataset containingonly X will make X closed. In this paper we introduce a novel theoretical frameworkthat uses this drawback to its advantage. Given a property of an itemset (closednessor non-derivability, for example) we can measure the robustness of this property. Aproperty of X is robust if it holds for many datasets subsampled from the original data.We demonstrate that we can compute this measure analytically for several importantclasses of itemsets: closed [Pasquier et al. 1999], free [Boulicaut et al. 2003], non-derivable [Calders and Goethals 2007], and totally shattered itemsets [Mielikäinen2005]. Computing robust itemsets under subsampling turns out to be practical for free,non-derivable, and totally shattered itemsets. Unfortunately, for closed itemsets thetest for robustness is prohibitively expensive.A possible drawback of our approach is that it depends on a parameter α , the proba-bility of including a transaction in a subsample. In addition to providing reasonableguidelines to choose α we also introduce a technique making us independent of α . Weshow that there is a neighborhood near 1 in which the ranking of itemsets does notdepend on α . We further demonstrate how we can compute this ranking without actu-ally discovering the exact neighborhood or computing the measure for the itemsets. Wegive exact solutions for free, non-derivable, and totally shattered itemsets and providepractical heuristics for closed itemsets.In the remainder of this paper we describe related work and motivate our approachin Section 2. Itemsets robust under subsampling and algorithms to find them aredescribed in Section 3. We discuss ordering itemsets based on large values of α inSections 4–5. Section 6 demonstrates how the subsampling approach can reduce thenumber of reported itemsets significantly. The results are discussed in comparison withapproximate itemsets in Sections 7.
2. RELATED WORK AND MOTIVATION
The design goal of condensed representations [Calders et al. 2006] of frequent itemsetsis to be able to answer all possible frequency queries. For example, non-derivableitemsets [Calders and Goethals 2007] exclude any itemset whose support can be derivedexactly from the supports of its subsets using logical rules. Other examples of suchcomplete collections are the closed and the free itemsets which are based upon thenotion of equivalence of itemsets. Two itemsets are equivalent if they are supported by
ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :3 exactly the same set of transactions. This notion of equivalence divides the frequentitemsets into equivalence classes. The unique maximal element of each equivalenceclass is a closed itemset [Pasquier et al. 1999]. No more items can be added to thisset without losing some supporting transactions. The not necessarily unique minimalelements of the equivalence class are free itemsets [Boulicaut et al. 2003] or generators.No items can be taken out without adding transactions to their support set. Completecondensed representations such as those based upon the non-derivable, closed, and freesets allow the derivation of the support of all frequent itemsets. Such representationsare useful because they are more compact, yet they still support further mining taskssuch as the generation of association rules where the frequencies of all subsets of anitemset are needed to determine the confidence of all possible rules.Nevertheless, even the number of closed and free itemsets can still be very largewhen the minimum support threshold is low. As for other tasks knowing the frequencyof all frequent itemsets may be less useful because there is a large redundancy inthe set of frequent itemsets. By using approximate methods the number of patternscan be further reduced; for instance by clustering itemsets representing similar setsof transactions [Xin et al. 2005], enforcing itemsets to have a minimum margin ofdifference in support [Moerchen et al. 2010], or ranking itemsets by significance [Brinet al. 1997; Gallo et al. 2007; Webb 2007; Tatti 2008].The above approaches have in common that the complete dataset is considered and noassumption on potential noise is made. In fault tolerant approaches the strict definitionof support, requiring all items of an itemset to be present in a transaction is relaxed,see [Gupta et al. 2008; Calders et al. 2007; Uno and Arimura 2007; Luccese et al. 2010].Rather, it is assumed that items can present or absent at random in the transactions.These approaches can reveal important structures in noisy data that might otherwiseget lost in a huge amount of fragmented patterns. One needs to be aware though thatthey report approximate support values and possibly list itemsets that are not observedas such in the collection at all or with much smaller support. Also the design goal isnot to reduce the number of reported patterns. Only [Cheng et al. 2006] considers the3combination of the two approaches and studies closedness in combination with faulttolerance.Furthermore, a third class of techniques considers a statistical null hypothesis andranks patterns according to how much their support deviates from their expectedsupport under the null model [Brin et al. 1997; Gallo et al. 2007; Webb 2007; Tatti 2008].Unlike these approaches, we do not assume a statistical null hypothesis. We also do notassume any noise model, such as flipping the values of a matrix independently. Insteadour goal is to study robustness of a given property based on subsampling transactions.The idea of using random databases to assess data mining results has been proposedin [Gionis et al. 2007; Hanhijärvi et al. 2009; De Bie 2011]. The goal is to first infersome (simple) background information from a dataset, and then consider all datasetsthat have the same statistics. A data mining result is then deemed interesting only if itappears in a small number of these datasets. Interestingly enough, this is the oppositeof what we are considering to be important; that is, we want to find itemsets that satisfythe predicate in many random subsets of the data. This philosophical difference can beexplained by completely orthogonal randomizations. The authors in the aforementionedpapers sample random datasets from simple statistics, that is, they ignore on purposecomplex interactions between items, and try to explain mining results with simpleinformation. Our goal is not to explain results but rather to test whether our resultsare robust by testing how data mining results change if we remove transactions.An idea using random datasets to compute the smoothness of results has beenproposed in [Misra et al. 2012]. The idea is to measure how stable the results are bysampling random datasets from a distribution that favors datasets close to the original ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :4 one, and computing the average deviation from the original result in the sampleddatasets. Finally, stability of rankings has been studied in the context of networks, seefor example [Ghoshal and Barabási 2011].
3. ROBUST ITEMSETS
In this section we define the robustness and describe how to compute it efficiently.
We begin by reviewing the preliminaries and introducing the notations used in thepaper.A binary dataset D is a set of transactions, tuples ( tid , t ) consisting of a transaction idand a binary vector t ∈ { , } K of length K . The i th element of a transaction correspondsto an item a i ; a in the i th position indicates that the transaction contains the item, a that it does not. We denote the collection of all items by A = { a , . . . , a K } .If S is a set of binary vectors of length K , we will write D ∩ S to denote { ( tid , t ) ∈ D | t ∈ S } .An itemset X is a subset of A . Given a binary vector t of length K and an itemset X ,we define t X to be the binary vector of length | X | obtained by keeping only the positionscorresponding to the items in X .Given an itemset X = ( x , . . . , x N ) and a binary vector v of length N , we define the support sp ( X = v ; D ) = |{ ( tid , t ) ∈ D | t X = v }| to be the number of transactions in D , where the items in X obtain the values given in v . We often omit D from the notation, when it is clear from the context. In addition, if v contains only 1s, we simply write sp ( X ) . Note that sp ( X ) coincides with the traditionaldefinition of a support for X . Discovering frequent itemsets, that is, itemsets whosesupport exceeds some given threshold is a well-studied problem. Example . Throughout the paper we will use the following dataset D as a runningexample: D =
1: 0 0 0 0 12: 0 1 0 1 13: 1 1 1 1 14: 0 1 0 1 15: 1 1 1 1 16: 1 0 0 0 0 D contains items, a , b , c , d , and e , and transactions. For this dataset we have sp ( ab ) = 2 , and sp ( ab = [1 , .We say that a function f mapping an itemset X to a real number f ( X ) is monoton-ically decreasing if for each Y ⊆ X we have f ( Y ) ≥ f ( X ) . A classic pattern miningtask is to discover all itemsets of having f ( X ) ≥ ρ given a threshold ρ and a function f mapping an itemset to a real number. If this function turns out to be monotonicallydecreasing, then we can use efficient pattern mining algorithms to discover all patternssatisfying this criterion.Our next step is to define 4 different properties for itemsets. These are closed, free,non-derivable, and totally shattered itemsets. The goal of this work is to study how tointroduce a measure of robustness for these properties. Closed Itemsets.
An itemset X is said to be closed , if there is no Y (cid:41) X such that sp ( X ) = sp ( Y ) , i.e., X is maximal w.r.t. set inclusion among the itemsets having thesame support. We define a predicate σ c ( X ; D ) = (cid:26) if X is closed in D, otherwise . ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :5 Every closed itemset corresponds to the intersection of a subset of transactions in D and vice versa. Free Itemsets.
An itemset X said to be free if there is no Y (cid:40) X such that sp ( X ) = sp ( Y ) , i.e., free itemsets are minimal among the itemsets having the same support. Wedefine a predicate σ f ( X ; D ) = (cid:26) if X is free in D, otherwise . A vital property of free itemsets is that they constitute a downward closed collectionallowing efficient mining with an Apriori-style algorithm (see Theorem 1 in [Boulicautet al. 2000]). That is, if an itemset X is free, all its subsets are free as well. Example . The closed itemsets in our running example are a , e , bde , and abcde .On the other hand, the itemsets ∅ , a , b , c , d , e , ab , ad , and ae are free. Non-derivable Itemsets.
An itemset X is said to be derivable , if we can derive itssupport from the supports of the proper subsets of X , otherwise an itemset is called non-derivable . We define a predicate σ n ( X ; D ) = (cid:26) if X is non-derivable in D, otherwise . Non-derivable itemsets form a downward closed collection (Corollary 3.4 in [Caldersand Goethals 2007]), hence we can mine them using an Apriori-style approach.We say that an itemset X is totally shattered if sp ( X = v ) > for all possible binaryvectors v . In other words, every possible combination of values for X occur in D . Again,we define a predicate σ s ( X ; D ) = (cid:26) if X is totally shattered in D, otherwise . Totally shattered itemsets are related to the VC-dimension [Mielikäinen 2005], and wecan show that a totally shattered itemset is always free and non-derivable (but not theother way around).
Example . Itemset ab in the running example is totally shattered. Itemset ac isnon-derivable but not totally shattered because sp ( ac = [0 , .It is easy to see from the definition that totally shattered itemsets constitute a down-ward closed collection, hence they are easy to mine using an Apriori-style approach. In this section we propose a measure of robustness for itemsets with a predicate σ .The idea is to sample random subsets from a given dataset and measure how often thepredicate σ ( X ) holds in a random dataset. Intuitively we consider an itemset robust ifthe predicate is true for many subsets of the database.In order to define the measure formally, we first define a probability for a subset of D . Definition . Given a binary dataset D , and a real number α , ≤ α ≤ , we definea random dataset D α obtained from D by keeping each transaction with probability α , or otherwise discarding it. More formally, let S be a subset of D . The probability of D α = S is equal to p ( D α = S ) = α | S | (1 − α ) | D |−| S | . (1) ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :6 We can now define the robustness of an itemset X as the probability of σ ( X ) beingtrue in a random dataset. Definition . Given a binary dataset D , a real number α , and an itemset predicate σ , we define the robustness to be the probability that σ ( X ; D α ) = 1 , that is, r ( X ; σ, D, α ) = p ( σ ( X ; D α ) = 1) = (cid:88) σ ( X ; S )=1 p ( D α = S ) . For notational clarity, we will omit D and α when they are clear from the context. Example . Consider itemset ab in our running example. Let α = 1 / . Note that sp ( ab = [0 , sp ( ab = [1 , and sp ( ab = [0 , sp ( ab = [1 , . In order for ab to still be totally shattered on a subset each of these supports needs to stay greaterthan zero. The probability of this event is equal to / × / × (1 − / × / × (1 − / × /
3) = 25 / , because for the first two cases we need to sample the single transaction upholdingthe property and for the other two cases we need to make sure we do not skip bothtransactions we need to uphold the property.Our main goal is to mine itemsets for which the robustness measure exceed somegiven threshold ρ , that is, find all itemsets for which r ( X ; σ, D, α ) ≥ ρ .In order to mine all significant patterns we need to show that the robustness measureis monotonically decreasing. This is indeed the case if the underlying predicate ismonotonically decreasing.P ROPOSITION
Let σ be a monotonically decreasing predicate. Then r ( X ; σ, D, α ) is also monotonically decreasing. P ROOF . Let Y and X be itemsets such that Y ⊂ X . Then r ( X ; σ, D, α ) = (cid:88) σ ( X ; S )=1 p ( D α = S ) ≤ (cid:88) σ ( Y ; S )=1 p ( D α = S ) = r ( Y ; σ, D, α ) , which proves the proposition.As pointed out in Section 3.1, predicates for free, non-derivable, and totally shattereditemsets are monotonically decreasing. However, the predicate for closedness is notmonotonically decreasing.We will finish this section by considering how robustness depends on α . If we set α = 1 , then r ( X ; σ, D, α ) = σ ( X ; D ) . Naturally, we expect that when we lower α , therobustness would decrease. This holds for predicates that satisfy a specific property. Definition . We say that a predicate σ is monotonic w.r.t. deletion if for eachitemset X , each dataset D , and each transaction t ∈ D it holds that if σ ( X ; D ) = 0 , then σ ( X ; D − t ) = 0 .P ROPOSITION
Let σ be a predicate monotonic w.r.t. deletion. Then r ( X ; σ, D, α ) ≤ r ( X ; σ, D, β ) , for α ≤ β . P ROOF . We will prove the proposition by induction over | D | . The proposition holdstrivially for | D | = 0 . Assume that the theorem holds for | D | = N and let D be a datasetwith | D | = N + 1 .Fix t ∈ D and define a new predicate σ t ( X ; S ) = σ ( X ; S ∪ { t } ) , where S is a dataset. σ t is monotonic w.r.t deletion. Otherwise, if there is a dataset S , a transaction u ∈ S an itemset Y violating the monotonicity, then S ∪ { t } , the same transaction u , and theitemset Y will violate the monotonicity for σ . ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :7 Moreover, since σ is monotonic w.r.t deletion, it holds that σ ( X ; S ) ≤ σ t ( X ; S ) . This inturns implies that r ( X ; σ, S, α ) ≤ r ( X ; σ t , S, α ) . (2)Let us write D (cid:48) = D − { t } . Then we have, r ( X ; σ, D, α ) = (1 − α ) r ( X ; σ, D (cid:48) , α ) + α r ( X ; σ t , D (cid:48) , α ) ≤ (1 − β ) r ( X ; σ, D (cid:48) , α ) + β r ( X ; σ t , D (cid:48) , α ) ≤ (1 − β ) r ( X ; σ, D (cid:48) , β ) + β r ( X ; σ t , D (cid:48) , β )= r ( X ; σ, D, β ) , where the first inequality holds because of Equation 2 and the second inequality holdsbecause of induction assumption. This proves the proposition.It turns out that all the predicates we considered in Section 3.1 are monotonic w.r.t.deletion.P ROPOSITION
Predicates σ c , σ f , σ n , and σ s are monotonic w.r.t. deletion. In order to prove the case for non-derivable itemsets we will need the followingtechnical lemma. We will also use this lemma later on.L
EMMA
An itemset X is derivable if and only if there are two vectors v and w of length | X | with v having odd number of s and w having even number of s such that sp ( X = v ) = sp ( X = w ) = 0 . P ROOF . Let O be the set of binary vectors of length | X | having odd number of s andlet E be the set of binary vectors of length | X | having even number of s.An alternative way of describing non-derivable itemsets is to compute the followingquantities u = sp ( X ) + min x ∈ O sp ( X = x ) and l = sp ( X ) − min x ∈ E sp ( X = x ) . We can show that l ≤ sp ( X ) ≤ u , both u and l can be computed from proper subsets of X with the inclusion-exclusion principle (see [Calders and Goethals 2007]). We alsoknow that an itemset is derivable if and only if u = l (see [Calders and Goethals 2007]).This is because we know then that l = sp ( X ) = u . Let v = arg min x ∈ O sp ( X = x ) and w = arg min x ∈ E sp ( X = x ) .This implies that u − l = sp ( X = v ) + sp ( X = w ) , which proves the lemma.P ROOF OF P ROPOSITION v such that sp ( X = v ; D ) = 0 . This immediately implies that sp ( X = v ; D − { t } ) = 0 . Thus σ s is monotonic w.r.t. deletion. Similarly, Lemma 3.11implies that σ n is monotonic w.r.t. deletion.An itemset X is not free, if there is x ∈ X such that there is no transaction u ∈ D forwhich u x = 0 and u y = 1 for all y ∈ X − { x } . If this holds in D , then it holds for D − { t } .This makes σ f monotonic w.r.t. deletion. Similarly, an itemset X is not closed, if there is x / ∈ X such that there is no transaction u ∈ D for which u x = 0 and u y = 1 for all y ∈ X .If this holds in D , then it holds for D − { t } . This makes σ c monotonic w.r.t. deletion. Example . The itemset bd is not closed because its superset bde is always ob-served when bd is observed. No matter which transaction we delete (one with or without bde ) this will not change. Note, however, that bde can become non-closed if transactions2 and 4 are deleted because then abcde will have the same support of 2. ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :8 Table I. Computational complexity of robustness and orders. Computingmeasures is explained in Section 3.3. Computing orders is explainedin Section 4. K is the number of items, |C| is the number of frequentclosed itemsets. predicate measure order order estimatefree O ( | X | ) O ( | X | ) –totally shattered O (2 | X | ) O (2 | X | ) –closed O (2 K −| X | ) O (2 K −| X | ) O ( (cid:12)(cid:12) C (cid:12)(cid:12) ) non-derivable O (2 | X | ) O ( | D | | X | ) – In this section we demonstrate how to compute the robustness measure for the predi-cates. Computing the measure directly from the definition is impractical since D has | D | different subsamples. It turns out that computing free, non-derivable, and totallyshattered itemsets has practical formulas while the robustness measure for closeditemsets has no practical formulation (see Table I).We will first demonstrate how to compute robustness for free and totally shattereditemsets. In order to do that we introduce the following function: Given an itemset X and a set of binary vectors V ⊆ { , } | X | we define o ( X, V, α ) = (cid:89) v ∈ V − (1 − α ) sp ( X = v ) . Intuitively, o ( X, V, α ) denotes the probability of the following event: for every vector v ∈ V , sp ( X = v ; D α ) > . Note that since every transaction can support at most one X = v , the events sp ( X = v ; D α ) > are independent from each other. Note that wecan compute o ( X, V, α ) in O ( | V | ) time. Our next step is to show that robustness for freeitemsets can be expressed with with o ( X, V, α ) for a certain set of vectors V .P ROPOSITION
Given an itemset X , let V be the set of | X | vectors having | X | − ones and one 0. The robustness of a free itemset is r ( X ; σ f , α ) = o ( X, V, α ) . P ROOF . Given an item x ∈ X , define an event T x = sp ( X − { x } ; D α ) > sp ( X ; D α ) . X is still free in D α if T x is true for all x ∈ X . T x is true if and only if D α containsa transaction t with t x = 0 and t y = 1 for y ∈ X − { x } . There are sp ( X = v ; D ) suchtransactions, where v ∈ V is the vector for which v x = 0 . p ( T x ) is the probability of notremoving all these transactions, thus p ( T x ) = 1 − (1 − α ) sp ( X = v ; D ) . Since each of these transaction is missing only one x ∈ X , there are no commontransactions between different events T x , making them independent. Thus, we canconclude r ( X ; σ f , α ) = (cid:81) x ∈ X p ( T x ) = o ( X, V, α ) .A similar result also holds for totally shattered itemsets.P ROPOSITION
Given an itemset X , let V be the set of all binary vectors oflength | X | . The robustness of a totally shattered itemset is r ( X ; σ s , α ) = o ( X, V, α ) . P ROOF . Given a binary vector v ∈ V , define an event T v = sp ( X = v ; D α ) > . X is still totally shattered in D α if T v is true for all v ∈ V . p ( T v ) is the probability ofnot removing all these transactions, thus p ( T v ) = 1 − (1 − α ) sp ( X = v ; D ) . Again, sinceno transaction can contribute to different T v being true, the random variables areindependent and we obtain r ( X ; σ s , α ) = (cid:81) v ∈ V p ( T v ) = o ( X, V, α ) .Note that the formula in Proposition 3.14 corresponds directly to Example 3.6. ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :9 Let us now consider non-derivable itemsets. The analytic formula is somewhat morecomplicated than for free or totally shattered itemsets, although, the principle remainsexactly the same.P
ROPOSITION
Given an itemset X , let V be the set of binary vectors of length | X | having odd number of ones. Similarly let W be the set of binary vectors of length | X | having even number of ones. The robustness of a non-derivable itemset is r ( X ; σ n , α ) = 1 − (1 − o ( X, α, V ))(1 − o ( X, α, W )) . P ROOF . Let us define the event T V to be that there is no v ∈ V such that sp ( X = v ) =0 . Similarly, let T W be the event that there is no w ∈ W such that sp ( X = w ) = 0 .According to Lemma 3.11, an itemset X is derivable if T V and T W are both false.Using the same argument as with Proposition 3.14, we see that p ( T V ) = o ( X, α, V ) .Similarly, p ( T W ) = o ( X, α, W ) . Since V ∩ W = ∅ , events T V and T W are independent.Hence, r ( X ; σ n , α ) is equal to − p ( ¬ T V ∧ ¬ T W ) = 1 − (1 − p ( T V )(1 − p ( T W )) . This completes the proof.We will now consider closed itemsets. Unlike for the free/totally shattered itemsets,there is an exponential number of terms in the expression for the robustness. The keyproblem is that while we can write the robustness in a similar fashion as we did in theproofs of the previous propositions, the events sp ( X ∪ { y } ) < sp ( X ) for all y ∈ A \ X ,will no longer be independent, and hence we cannot multiply the probabilities of theindividual events. Indeed, in our running example, bde is a closed itemset. The events sp ( abde ; D α ) < sp ( bde ; D α ) and sp ( bcde ; D α ) < sp ( bde ; D α ) are clearly dependent sinceboth events occur in exactly the same subsamples, namely those that contain at leastone of the transactions 3 and 5.P ROPOSITION
The robustness of a closed itemset is r ( X ; σ c , α ) = (cid:88) Y ⊇ X ( − | Y |−| X | (1 − α ) sp ( X ) − sp ( Y ) . P ROOF . Given an item y / ∈ X , define an event E y = sp ( X ∪ { y } ; D α ) = sp ( X ; D α ) .Itemset X is still closed in D α if all E y are false, thus r ( X ; σ c , α ) is equal to − p (cid:0) (cid:95) y / ∈ X E y (cid:1) = (cid:88) Z ⊆ ( A \ X ) ( − | Z | p (cid:0) (cid:94) y ∈ Z E y (cid:1) , where the equality follows from the inclusion-exclusion principle. Through this trans-formation we now need to determine the probability of all E y , y ∈ Z simultaneouslybeing true. For this all sp ( X ) − sp ( Z ∪ X ) transactions containing X but not Z musthave been excluded from D α , hence p (cid:0) (cid:94) y ∈ Z E y (cid:1) = (1 − α ) sp ( X ) − sp ( Z ∪ X ) . Substituting this above and writing Y = X ∪ Z leads to the proposition. Example . In our running example, we have sp ( bde ) = 4 . This itemset has superitemsets having the supports sp ( abde ) = sp ( bcde ) = sp ( abcde ) = 2 . Hence, themeasure r ( bde ; σ c , α ) is equal to − (1 − α ) − − (1 − α ) − + (1 − α ) − = 1 − (1 − α ) , where itemsets bde , abde , bcde , and abcde correspond to the terms in the given order. ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :10
Unlike with the other predicates, analytic robustness for closed itemsets cannot be becomputed in practice since there are K −| X | terms in the analytic solution. It turns outthat we cannot do much better as computing robustness is NP -hard.P ROPOSITION
The following
Robustness of a Closed Itemset (RCI) problem is NP -hard:For a given database D over the set of items A , parameters α, ρ ∈ [0 , , anditemset X ⊆ A , decide if r ( X ; σ c , D, α ) ≥ ρ . P ROOF . We will reduce the well-known NP -complete vertex cover problem to theRCI problem. Let G ( V, E ) be a graph. For every vertex v ∈ V , we will create a uniquetransaction with identifier tid v . The set of items over which the transactions will bedefined is the set of edges E = { e , . . . , e K } . Let t v = [ t v , . . . , t vK ] denote the binaryvector of length | E | defined as: for all i = 1 , . . . , K , t vi = 1 if and only if e i is not incident with v . The transaction database D is now defined as D = { ( tid v , t v ) | v ∈ V } . The itemset X in the RCI-problem will be the empty set, X = ∅ . Before we specify α and ρ , we show the following property:L EMMA
Let S ⊆ D ; ∅ is closed in S if and only if V S = { v ∈ V | ( tid v , t v ) ∈ S } is a vertex cover of G . P ROOF . If ∅ is closed in S , then for every e there is t ∈ D such that t e = 0 , otherwise sp ( e ) = sp ( ∅ ) . Hence, for all e ∈ E there must exist at least one v ∈ V S t ve = 0 , that is, e must be incident with v . Since e was chosen arbitrary, this implies that every edge in E is covered by at least one node in V S and hence V S is a vertex cover of G .This relation between the closedness of ∅ in a subsample S and V S being a vertex-cover allows us to establish the following relation between the robustness of ∅ in D andthe existence of a vertex-cover of size k , that holds for any α ∈ [0 , .L EMMA If G has a vertex cover of size k , r ( ∅ ; σ c , D, α ) ≥ α k (1 − α ) | D |− k otherwise, r ( ∅ ; σ c , D, α ) ≤ | D | (cid:88) j = k +1 α j (1 − α ) | D |− j (cid:18) | D | j (cid:19) . P ROOF . Indeed, let VC be a vertex cover of G , then ∅ is closed in S = { ( tid v , t v ) | v ∈ VC } . The probability that a randomly selected sample equals S is equal to L = α k (1 − α ) | D |− k , which is a lower bound on the robustness of ∅ . Otherwise, if there does not exist a vertexcover of size k , this implies that ∅ is not closed in any subsample S of size k or less.Therefore, the probability mass of all subsamples with at least k + 1 transactions U = | D | (cid:88) j = k +1 α j (1 − α ) | D |− j (cid:18) | D | j (cid:19) is an upper bound on the robustness of ∅ . ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :11
The proof now concludes by carefully choosing α such that U ≤ L , and selecting ρ such that U ≤ ρ ≤ L ; in that way, the robustness of the closedness of ∅ exceeds L andhence ρ if G has a vertex cover of size k or less, and otherwise the robustness is below U ,and hence also below ρ . The last step in the proof is hence to show that we can alwayspick α such that U ≤ L . It can easily be seen that α = 2 − ( | D | +1) satisfies this condition:Since α ≤ / , we can now bound U by | D | (cid:88) j = k +1 α j (1 − α ) | D |− j (cid:18) | D | j (cid:19) ≤ | D | (cid:88) j = k +1 α k +1 (1 − α ) | D |− k − (cid:18) | D | j (cid:19) = 2 | D | α k +1 (1 − α ) | D |− k − . The right hand-side is smaller than L if and only if − α ≥ | D | α . Note that for ourchoice of α , we have − α = 1 − α = 2 − ( | D | +1) ≥ / | D | α .The binary representation of the numbers α and ρ are polynomial in the size ofthe original vertex cover problem and the reduction can be carried out in polynomialtime.
4. ORDERING PATTERNS
The robustness measure depends on the parameter α . In this section we propose aparameter-free approach. The idea is to study how the measure is behaving when α is close to . We can show that there is a (small) neighborhood close to 1, where the ranking of itemsets does not depend on α , that is, there exists β < such that if α, α (cid:48) ∈ [ β, r ( X ; σ, D, α ) ≤ r ( Y ; σ, D, α ) if and only if r ( X ; σ, D, α (cid:48) ) ≤ r ( Y ; σ, D, α (cid:48) ) .We will show how compute the ranking in this region, that can be used to select top- k itemsets by robustness without actually computing the measure or determining β .In this section we will first give first formal definition, and discuss the theoreticalproperties of the ranking. In the next section we demonstrate how we can computethe order in practice, that is, how to avoid determining β and computing the actualrobustness. α approaches When α = 1 then D α = D with probability and the measure is equivalent to theunderlying predicate, providing only a crude ranking: itemsets that satisfy the predicatevs. itemsets that do not. If we make α slightly smaller the measure will decrease a littlebit for each itemset. The amount of this change will vary from one itemset to anotherbased on how likely removing only very few transactions will break the predicate forthis itemset. We can use the magnitude of this change to obtain a more fine-grainedranking by robustness. The key result for this is that there is a small neighborhoodbelow 1 in which the ranking of itemsets based on the measure does not depend on α .P ROPOSITION
Given a predicate σ and a dataset D , there exists a number β < such that r ( X ; σ, D, α ) ≤ r ( Y ; σ, D, α ) if and only if r ( X ; σ, D, α (cid:48) ) ≤ r ( Y ; σ, D, α (cid:48) ) , for any itemset X and Y and β ≤ α ≤ , β ≤ α (cid:48) ≤ . P ROOF . Fix X and Y and consider f ( α ) = r ( X ; σ, D, α ) − r ( Y ; σ, D, α ) . Since the measure is a finite sum of probabilities that are, according to Eq. 1, polyno-mials of α , the function f is a polynomial. This implies that f can have only a finitenumber of s, of f = 0 . Consequently there is a neighborhood N = [ β, such that either f ( α ) ≥ for any α ∈ N , or f ( α ) ≤ for α ∈ N . Since there is only a finite number ofitemsets, we can take the maximum of all β s to prove the theorem. ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :12
Proposition 4.1 allows us to define an order for itemsets based on the measure for α ≈ . Definition . Given a predicate σ , and a dataset D , we say that X (cid:22) σ Y , where X and Y are itemsets, if there exists β < such that r ( X ; σ, D, α ) ≤ r ( Y ; σ, D, α ) for any α such that β ≤ α ≤ . Moreover, if r ( X ; σ, D, α ) < r ( Y ; σ, D, α ) for some α ≥ β , then wewrite X ≺ σ Y .Note that Proposition 4.1 implies that (cid:22) σ is a total linear order. That is, we can usethis relation to order itemsets. In this section we will study the properties of the order. Namely, we will show twoproperties:— We will show in Proposition 4.4 that robustness for α ≈ , essentially measures howmany transactions we need to remove in order to make the predicate fail. The moretransactions are needed, the more robust is the itemset.— We will show in Proposition 4.9 that when we increase the number of transactions,then a ranking based on robustness for any fixed α will become equivalent with theranking based on ≺ σ .First, we will need the following key lemma that can be proven by elementary realanalysis.L EMMA
Let f ( x ) = (cid:80) Ni =0 a i x i be a non-zero polynomial. Let k be the first indexsuch that a k (cid:54) = 0 If a k > , then there is a β > such that ≤ x ≤ β implies f ( x ) ≥ .Similarly, if a k < , then there is a β > such that ≤ x ≤ β implies f ( x ) ≤ . The lemma essentially says that if we express the robustness as a polynomial of − α ,then we can determine the order by studying the coefficients of the polynomial.Our first application of this lemma is a characterization of the order. Assume twoitemsets X and Y . Assume that we need to remove n transactions in order to makethe predicate σ ( Y ) fail and that we can fail σ ( X ) by removing less than n transactions.Then it holds that X ≺ σ Y . The following proposition generalizes this idea.P ROPOSITION
Let σ be a predicate, X and Y two itemsets, and D a dataset.Define a vector c ( X ) of length | D | such that c k ( X ) is the number of subsamples of D with | D | − k points failing the predicate σ ( X ) . Similarly, define c ( Y ) . Then, c ( X ) = c ( Y ) implies that r ( X ; σ, D, α ) = r ( Y ; σ, D, α ) for any α . If c ( X ) is larger than c ( Y ) inlexicographical order, then X ≺ σ Y . P ROOF . Let us first write the robustness of X using the vector c k ( X ) . We have, − r ( X ; σ, D, α ) = p ( σ ( X ; D α ) = 0) = | D | (cid:88) k =0 p ( σ ( X ; D α ) = 0 , | D α | = | D | − k )= | D | (cid:88) k =0 (1 − α ) k α | D |− k (cid:88) S ⊆ D | S | = | D |− k − σ ( X ; S ) = | D | (cid:88) k =0 (1 − α ) k α | D |− k c k ( X ) . If c k ( X ) = c k ( Y ) it follows immediately that the robustness for X and Y are identical. ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :13
Assume now that c ( X ) is larger than c ( Y ) in lexicographical order. That it, there is l such that c l ( X ) > c l ( Y ) and c k ( X ) = c k ( Y ) for k < l . We have r ( Y ; σ, D, α ) − r ( X ; σ, D, α ) = | D | (cid:88) k = l (1 − α ) k α | D |− k ( c k ( X ) − c k ( Y ))= ( c l ( X ) − c l ( Y ))(1 − α ) l + f (1 − α ) , where f ( x ) is a polynomial such that the degree of an individual term in f is biggerthan l . Lemma 4.3 now proves the proposition.Interestingly enough, if we would define the order based on α ≈ , then we havea similar result with the difference that instead of deleting transactions we wouldbe adding them. We would rank Y higher than X if we can satisfy σ ( Y ) with lesstransactions than the number of transactions needed to satisfy σ ( X ) .Ranking itemsets based on how many transactions can be deleted is similar to thebreakdown point that measures robustness of statistical estimators. The breakdownpoint for estimators such as the mean is the number of observations that can be madearbitrarily large before the estimator becomes arbitrarily large as well. The breakdownvalue of the mean is 1, it becomes infinity as soon as one observation is set to infinity. Incontrast the median can handle just under half of the observations to be set to infinitybefore it breaks down.We will next show that, in essence, for large datasets the robustness for any α > will produce the same ranking as the order defined for α close to . For this we willconsider predicates only of certain type. The reason for this is to avoid some pathologicalpredicates, for example, σ ( X ; D ) = 1 if | D | is even, and otherwise. Definition . Let σ be a predicate. Let K be the number of items and let X be anitemset. We say that σ is a monotone CNF predicate if there is a collection { B i } L of setsof binary vectors of length K , (possibly) depending on X and K such that σ ( X ; D ) = (cid:26) if D ∩ B i (cid:54) = ∅ for each i = 1 , . . . , L, otherwise , that is, in order to σ ( X ; D ) = 1 , D must contain a transaction from each B i .Every predicate we consider in this paper is in fact a monotone CNF predicate.P ROPOSITION
Predicates σ c , σ f , σ n , and σ s are monotone CNF predicates. P ROOF . Fix an itemset X = x · · · x N , and K , the total number of items. Let Ω = { , } K be the collection of all binary vectors of length K . Free itemsets.
Let B i = (cid:8) t ∈ Ω | t x i = 0 , t x j = 1 , j (cid:54) = i, ≤ j ≤ N (cid:9) for i = 1 , . . . , N . Inorder to X to be free in D , we must have D ∩ B i (cid:54) = ∅ . Otherwise, sp ( X ) = sp ( X \ { x i } ) ,making X not free. Closed itemsets.
Define K − N sets by B i = (cid:8) t ∈ Ω | t x j = 1 , t i = 0 , ≤ j, ≤ N (cid:9) for i / ∈ X . X is closed in D if and only if D ∩ B i (cid:54) = ∅ . Otherwise, sp ( X ) = sp ( X ∪ { x i } ) ,making X not closed. Totally shattered itemsets.
Define K sets by B u = (cid:8) t ∈ Ω | t x j = u j , ≤ j, ≤ N (cid:9) foreach u ∈ { , } K . The proposition follows directly from the definition. ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :14
Non-derivable itemsets.
Let C u = (cid:8) t ∈ Ω | t x j = u j , ≤ j, ≤ N (cid:9) for each u ∈ { , } K .Define K − sets by B u,v = C u ∪ C v , where u, v ∈ { , } K , u has odd number of 1s and v has even number of 1s. The proposition follows directly Lemma 3.11. Example . In our running example, an itemset bde is closed if and only D contains at least one transaction from B = { (0 , , , , , (0 , , , , } and from B = { (1 , , , , , (0 , , , , } . The dataset does contain (0 , , , , making bde closed.In order to prove the main result we need the following lemma showing that therobustness of a monotone CNF predicate can be expressed in a certain way. We canthen exploit this expression in Proposition 4.9.L EMMA
Let σ be a monotone CNF predicate and let X be an itemset. Let K bethe number of itemsets. Then there is a set of coefficients { c i } N and a collection { S i } N ofsets of binary vectors of length K such that r ( X ; σ, D, α ) = N (cid:88) i =1 c i (1 − α ) | D ∩ S i | . P ROOF . Let S be a set of binary vectors of length K . The probability of a randomsubsample D α not having a transaction from S is equal to p ( D α ∩ S = ∅ ) = (1 − α ) | D ∩ S | . We can rewrite the robustness using the inclusion-exclusion principle, r ( X ; σ, D, α ) = 1 − p ( σ ( X ; D α ) = 0) = 1 − p ( D α ∩ B = ∅ ∨ · · · ∨ D α ∩ B L = ∅ )= 1 − L (cid:88) i =1 p ( D α ∩ B i = ∅ ) + (cid:88) ≤ i
Let σ be a monotone CNF predicate and let D be a dataset. Let X and Y be itemsets such that X ≺ σ Y in D . Let q be the empirical distribution of D and let R m be a dataset of m random transactions drawn from q . Let < α < . Thenthere is M such that E [ r ( X ; σ, R m , α )] < E [ r ( Y ; σ, R m , α )] for m > M . P ROOF . Let us write β = 1 − α . Lemma 4.8 says that we can write the difference inrobustness as r ( Y ; σ, R m , α ) − r ( X ; σ, R m , α ) = N (cid:88) i =1 c i β | R m ∩ S i | ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :15 for certain coefficients { c i } N and sets of binary vectors { S i } N . Let d k = (cid:80) | D ∩ S i | = k c i .Since X ≺ σ Y , Lemma 4.3 implies that there is l such that d l > and d k = 0 for k < l .Let S be a set of binary transactions, and let k = | S ∩ D | , that is, the probability ofgenerating a random transaction belonging to S is q ( t ∈ S ) = k/ | D | . We have E (cid:104) β | S ∩ R m | (cid:105) = m (cid:88) j =0 β j (cid:18) mj (cid:19) q ( t ∈ S ) j (1 − q ( t ∈ S )) m − j = (cid:18) β k | D | + 1 − k | D | (cid:19) m . We will write t k as shorthand for the right-side hand of the equation. Note that since β < , we have t k +1 < t k . We can write the expected difference between robustness as E (cid:34) N (cid:88) i =1 c i β | R m ∩ S i | (cid:35) = | D | (cid:88) k =0 d k t mk = d l t ml + | D | (cid:88) k = l +1 d k t mk = t ml (cid:0) d l + | D | (cid:88) k = l +1 d k ( t k /t l ) m (cid:1) . Since t k /t l < , the terms ( t k /t l ) m approach as m goes to infinity. Hence, there is M such that the sum in the right-hand side of the equation is larger than − d l for m > M .This guarantees that the difference is positive proving the proposition.This proposition suggests that ranking based on a fixed α and a parameter-freeranking will eventually agree if the dataset is large enough. In other words, β inProposition 4.1 will get smaller (on average) as the size of the dataset increases. Wewill see this phenomenon later on in Propositions 5.8 and 5.9.
5. COMPUTING ORDER IN PRACTICE
In this section we demonstrate how we can compute the ranking for free, non-derivable,and totally shattered itemsets and how we can estimate the ranking for closed itemsets.For computational complexity see Table I.
In this section we will demonstrate that we can compute the order for free and totallyshattered itemsets without finding an appropriate α . We will do this by analyzing thecoefficients of the measure viewed as a polynomial of − α .Note that for free and totally shattered itemsets these polynomials are given in Propo-sition 3.13 and Proposition 3.14. In order to obtain the coefficients of the polynomialwe can simply expand the polynomials. However, the polynomials in Proposition 3.13and Proposition 3.14 are regular enough so that we can compute the order without ex-panding the polynomials. In order to do so we need the following definition for orderingsequences. Definition . Given two non-decreasing sequences s = s , . . . , s K and t = t , . . . , t N ,we write s ≺ t if either there is s n < t n and s i = t i for all i < n or t is a proper prefixsequence of s , that is, s i = t i for i ≤ N < K . We write s (cid:22) t , if s = t or s ≺ t .The following proposition will allow us to order itemsets without expanding thepolynomials in Propositions 3.13–3.14.P ROPOSITION
Assume two polynomials f ( α ) = K (cid:89) i =1 (1 − (1 − α ) s i ) and g ( α ) = N (cid:89) i =1 (1 − (1 − α ) t i ) , where s = s , . . . , s K and t = t , . . . , t N are non-decreasing sequences of integers, s i , t i ≥ .If t (cid:22) s , then there is a β < such that β ≤ α ≤ implies f ( α ) ≥ g ( α ) . ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :16 P ROOF . The case s = t is trivial. Hence we assume that s (cid:54) = t . If s = 0 or t = 0 ,then f ( α ) = 0 or g ( α ) = 0 , and the result follows, hence we will assume that s i , t i > .Let { a i } and { b i } be coefficients such that f ( α ) = (cid:88) i a i (1 − α ) i and g ( α ) = (cid:88) i b i (1 − α ) i . Let I n be the collection of all subsequences of s that sum to n , I n = (cid:8) u | u is a subsequence of s, | u | (cid:88) i =1 u i = n (cid:9) . Similarly, let J n be the collection of all subsequences of t that sum to n . We can rewrite f ( α ) as f ( α ) = (cid:88) u is asubseq. of s ( − | u | (1 − α ) (cid:80) i u i which implies that a n = (cid:88) u ∈ I n ( − | u | and similarly b n = (cid:88) u ∈ J n ( − | u | . Assume that t ≺ s . If s is a prefix sequence of t , then g ( α ) = f ( α ) N (cid:89) i = K +1 (1 − (1 − α ) t i ) ≤ f ( α ) , which proves the proposition. Let n be as given in Definition 5.1. For every i < t n < s n ,the subsequences in I i and J i contain subsequences from s and t with indices smallerthan n . Since s and t are identical up to n , then it follows that I i = J i and consequently a i = b i . Let u ∈ I t n . Assume that | u | > . Since, we assume that s i > , u is a subsequenceof s , . . . , s n − . This means that we will find the same subsequence in J t n . Let A be thenumber of singleton sequences in I t n , A = |{ u ∈ I t n | | u | = 1 }| , and let B be the numberof singleton sequences in J t n . These singleton sequences correspond to the entries in s and t having the same value as t n . Since s and t are identical up to n , s does not contain t n after s n , it holds that B > A . We have now a s n − b s n = B − A > . Lemma 4.3 nowimplies that f (1 − x ) ≥ g (1 − x ) , when x is close to . Write α = 1 − x to complete theproof.The polynomials in Propositions 3.13–3.14 have the form used in Proposition 5.2.Consequently, we can use the proposition to order itemsets. In order to do that we needthe following definitions. Definition . Given a dataset D and an itemset X , we define a free margin vector mv ( X ; D, σ f ) to be the sequence of | X | integers sp ( X = v ; D ) , where v is a binary vectorhaving | X | − ones, ordered in the increasing order.Similarly, we define a totally shattered margin vector mv ( X ; D, σ s ) to be a sequenceof | X | integers sp ( X = v ; D ) ordered in the increasing order.C OROLLARY
Given itemsets X and Y and a dataset D , X (cid:22) σ f Y if and only if mv ( X ; D, σ f ) (cid:22) mv ( Y ; D, σ f ) . C OROLLARY
Given itemsets X and Y and a dataset D , X (cid:22) σ s Y if and only if mv ( X ; D, σ f ) (cid:22) mv ( Y ; D, σ s ) . ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :17
Example . In our running example, sp ( ab = [1 , and sp ( ab = [0 , ,hence the free margin vector is equal to mv ( ab ; σ f ) = [1 , . Similarly, we have sp ( ae = [1 , and sp ( ae = [0 , , hence the free margin vector is equal to mv ( ae ; σ f ) = [1 , . Hence, we conclude that ab ≺ σ f ae .Margin vectors are useful to determine the order of robust itemsets. However, wecan also use them to provide a bound for β given in Definition 4.2. More specifically,the further the margin vectors are from each other the lower α can be such that therobustness still agrees with the order. To make this formal, we will need the followingdefinition. Definition . Assume two non-decreasing sequences s = s , . . . , s K and t = t , . . . , t N such that s (cid:22) t . Let n be the first index such that s n < t n , we define d ( s, t ) = t n − s n . If no such such index exist, that is, t is a prefix sequence of s , wedefine d ( s, t ) = ∞ .The following propositions state that the larger d ( s, t ) , the lower α can be. Thisreflects the result of Proposition 4.9: large datasets will result in large differences inmargin vectors, allowing α to be small.P ROPOSITION
Assume itemsets X and Y and a dataset D such that X (cid:22) σ f Y .Let d = d ( mv ( X ; D, σ f ) , mv ( Y ; D, σ f )) . Then r ( X, σ f , D, α ) ≤ r ( Y, σ f , D, α ) for α ≥ − d (cid:112) | Y | + 1 . P ROPOSITION
Assume itemsets X and Y and a dataset D such that X (cid:22) σ s Y .Let d = d ( mv ( X ; D, σ s ) , mv ( Y ; D, σ s )) . Then r ( X, σ f , D, α ) ≤ r ( Y, σ f , D, α ) for α ≥ − d √ | Y | + 1 . Both propositions follow immediately from the following proposition.P
ROPOSITION
Given two non-decreasing sequences s = s , . . . , s K and t = t , . . . , t N such that s (cid:22) t , let d = d ( s, t ) . Then K (cid:89) i =1 (1 − (1 − α ) s i ) ≤ N (cid:89) i =1 (1 − (1 − α ) t i ) for α ≥ − d √ N + 1 . P ROOF . If t is a prefix sequence of s , then the inequality holds for any α . Assumethat t is not a prefix sequence and let n be the first index such that s n < t n . Write β = 1 − α . We can upper bound the left-hand side by K (cid:89) i =1 (1 − β s i ) ≤ (1 − β s n ) n − (cid:89) i =1 (1 − β s i ) and lower bound the right-hand side by N (cid:89) i =1 (1 − β t i ) ≥ (1 − β s n + d ) N n − (cid:89) i =1 (1 − β s i ) . Hence it is sufficient to show that (1 − β s n ) ≤ (1 − β s n + d ) N or log(1 − β s n ) ≤ N log(1 − β s n + d ) . ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :18
We apply the inequalities − x ≤ log(1 − x ) ≤ − x/ (1 − x ) which gives us − β s n ≤ − N β s n + d (1 − β s n + d ) or ≥ N β d + β s n + d . Since β d ≥ β s n + d it is sufficient to have ≥ (1 + N ) β d . This is true for β ≤ d √ N +1 . In this section we will introduce a technique for estimating the ranking for closeditemsets. As the measure for closed itemsets has a different form than for free or totallyshattered itemsets we are forced to seek for alternative approaches. We approach theproblem by first expressing the coefficients of the polynomial with supports of closeditemsets. Then we estimate the polynomial by considering only the most frequent closeditemsets.Let us consider Proposition 3.16. Let a k be the coefficient for the k th term of thepolynomial for r ( X ; σ c , α ) given in Proposition 3.16. If we can compute these numbersefficiently, we can use Lemma 4.3 to find the ranking.We will do this by first expressing a k using closed itemsets. In order to do that let cl ( X ) be the closure of an itemset X . Let us define e ( Y, X ) = (cid:88) Z ⊇ X, cl ( Z )= Y ( − | Z | + | X | to be the alternating sum over all itemsets containing X and having Y as their closure.Since all the itemsets having the same closure will have the same support we can writethe coefficients a k using e ( Y, X ) , a k = (cid:88) Y ⊇ X, sp ( X ) − sp ( Y )= k ( − | Y | + | X | = (cid:88) Y ⊇ X,Y = cl ( Y ) sp ( X ) − sp ( Y )= k e ( Y, X ) . (3)To compute e ( Y, X ) , first note that e ( X, X ) = 1 . If Y (cid:54) = X , then using the followingidentity (cid:88) Y ⊇ Y (cid:48)⊇ XY (cid:48) = cl ( Y (cid:48) ) e ( Y (cid:48) , X ) = (cid:88) Y ⊇ Z ⊇ X ( − | Z | + | X | = 0 we arrive to e ( Y, X ) = − (cid:88) Y (cid:41) Y (cid:48)⊇ XY (cid:48) = cl ( Y (cid:48) ) e ( Y (cid:48) , X ) . (4)Thus, we can compute e ( Y, X ) from e ( Y (cid:48) , X ) , where Y (cid:48) is a closed subset of Y . This isconvenient, because when computing e ( Y, X ) , say for a k , we have already computed allthe subsets of Y for previous coefficients. Example . Consider itemset e in our running example. There are two closedsupersets of e , namely bde and abcde , having the supports and , respectively. Usingthe update equations, we see that e ( e, e ) = 1 , e ( bde, e ) = − , and e ( abcde, e ) = 0 . As sp ( e ) = 5 , we see that the non-zero coefficients a i are a = 1 and a = − .The problem with this approach is that we can still have an exponential number ofclosed itemsets. Hence, we chose to estimate the ranking by only using frequent closeditemsets and estimate the remaining itemsets to have a support of . ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :19
This estimation is achieved by removing all closed non-frequent itemsets from thesums of Eqs. 3 and 4 and adding an itemset containing all the items and having thesupport . The code for this estimation is given in Algorithm 1. Algorithm 1:
Algorithm for estimating coefficients of the polynomial given in Propo-sition 3.16. input : X an itemset, C , frequent closed itemsets output : { a k } , coefficients of the polynomial if A / ∈ C then add A to C with sp ( A ) = 0 ; C ← { Y ∈ C | X ⊆ Y } ; L ← sets in C ordered by the subset relation; e ( X, X ) ← ; for Y ∈ L do e ( Y, X ) ← − (cid:80) Z ∈C ,Z (cid:40) Y e ( Z, X ) ; k ← sp ( X ) − sp ( Y ) ; a k ← a k + e ( Y, X ) ;Algorithm 1 takes O ( |C| ) time. In practice, this is much faster because an averageitemset does not have that many supersets.Now that we have a way of estimating a k from frequent closed itemsets, we can, giventwo itemsets X and Y , search the smallest k for which the coefficients differ in order toapply Lemma 4.3. Note that if the index of the differing coefficient, say k , is such that sp ( X ) − k is larger or equal to the support threshold, then a k is correctly computed byour estimation, and our approximation yields a correct ranking. In this section we will discuss how to compute the ranking non-derivable itemsets. Theranking for non-derivable is particularly difficult because we cannot use Proposition 5.2to avoid expanding the polynomial given in Proposition 3.15. We can, however, expandthe polynomial since, due to Eq. 1, it only has | D | terms. Once we have expanded thepolynomial, we can use Lemma 4.3 to compare the itemsets.First note that we can rewrite the measure as r ( X ; σ n , α ) = o ( X, α, V ) + o ( X, α, W ) − o ( X, α, U ) , (5)where U consists of all binary vectors of length | X | , V is the subset of U containingvectors having odd number of ones, and W = U \ V .Next, we will show how to expand a term o ( X, α, S ) for any set of binary vectors S .Once we are able to do that, we can expand each term in Eq. 5 individually to computethe final coefficients. In order to do that, we will use the identity (1 − x a ) N (cid:88) i =0 c i x i = N + a (cid:88) i =1 ( c i − c i − a ) x i , where in the right-hand side we define c i = 0 for i < or i > N . This gives us asimple iterative procedure, given in Algorithm 2: For each v ∈ S , we shift the currentcoefficients by sp ( X = v ) and subtract the result from the current coefficients.The highest degree in the polynomial will be (cid:80) v ∈ S sp ( X = v ) . Since, each v is uniquein S , this number is bounded by | D | . This means that we have to consider only | D | coef-ficients and that the computational complexity of E XPAND is O ( | S || D | ) . Consequently,computing the coefficients in Eq. 5 will take O ( | U || D | ) = O (2 | X | | D | ) time. We can further ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :20
Algorithm 2: E XPAND ( X, D, S ) , expands the polynomial o ( X, D, S ) input : X , an itemset, D , a dataset, S a set of vectors output : { c i } | D | set of coefficients of the polynomial o ( X, D, S ) c i ← for i = 0 , . . . , | D | ; c ← ; foreach v ∈ S do s ← sp ( X = v ) ; n i ← for i = 0 , . . . , s − ; n i ← c i − s for i = s, . . . , | D | ; c i ← c i − n i for i = 0 , . . . , | D | ; return { c i } | D | ;speed this up by using sparse vectors, and computing the terms in a lazy fashion duringthe comparison. Example . Consider itemset ac in our running example. We have sp ( ac = (0 , , sp ( ac ) = 2 , sp ( ac = (1 , , and sp ( ac = (0 , . Let V = { (0 , , (1 , } , W = { (0 , , (1 , } and U = V ∪ W . Since sp ( ac = (0 , , both o ( X, α, V ) and o ( X, α, U ) are . We have o ( X, α, W ) = (1 − (1 − α ) )(1 − (1 − α ) ) = 1 − (1 − α ) − (1 − α ) + (1 − α ) . Consequently, E
XPAND will return (1 , , − , − , , , as coefficients.
6. EXPERIMENTS
In this section we present our experiments.— We study typical behavior of robustness for free, totally shattered, and non-derivableitemsets as a function of α .— We test how similar the rankings are based on robustness and based on the order ≺ σ .— We test how the ranking of robust closed itemsets changes under the effect of noise.In addition, we provide examples of top-k robust closed and free itemsets. We used datasets from three repositories. The 8 FIMI [Goethals and Zaki 2003] datasetsinclude large transaction datasets derived from traffic data, census data, and retaildata. Two datasets are synthetically generated to simulate market basket data. Thedatasets from the UCI Machine Learning Repository [Asuncion and Newman 2007]represent classification problems from a wide variety of domains. We used the itemsetrepresentations of 29 datasets from the LUCS repository [Coenen 2003]. Finally weused 18 text datasets shipped with the Cluto clustering toolkit [Zhao and Karypis2002] but converted to itemsets using a binary representation of words in documentsdiscarding the term frequencies.
The goal of the first experiment is to show that this new constraint for itemsets cansignificantly reduce the number of itemsets reported in the results by removing itemsetsthat are spurious in the sense that they are unlikely to be observed on many subsamples.Throughout this section we will use α for the size of the data sample, ρ for the minimumrobustness threshold, and τ for the minimum support threshold. ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :21
Our first question is how the parameters should be chosen. It is clear that if we choose α very close to 1, then even itemsets that would lose their predicate by removing onlya few transactions still have a high likelihood of being found. We would thus expectmost robustness values to be close to 1 when α is close to . This would make choosinga suitable ρ very difficult and might lead to problems due to floating point arithmetics.Similarly, choosing α close to 0 will cause most itemsets to have a very low likelihoodof still being found, thus most robustness values will be close to 0. Thus choosing amedium α will be most useful to emphasize the quantitative difference between itemsetsof various robustness.As for the minimum robustness threshold ρ , the larger its value is, the stricter thefiltering will be. Choosing the threshold is somewhat application dependent but itshould not be close to zero, otherwise no reduction will be observed.To confirm our reasoning we performed a parameter study for the itemset versionof the Zoo dataset that describes 101 animals with 42 boolean attributes. This datacontains free itemsets, non-derivable itemsets, and totally shattereditemsets (at minimum support τ = 0 . ). The number of itemsets as a function of α and ρ is given in Figure 1. As expected— for large α all but the largest ρ do not reduce the number of itemsets reported,— as α becomes smaller, the itemsets are spread smoothly across the range of ρ allowinga meaningful quantitative evaluation,— for small α almost no itemsets are reported even for very small ρ . . . . . . . robustness threshold ρ . . . . . α (a) free itemsets . . . . . . robustness threshold ρ . . . . . α (b) non-derivable itemsets . . . . . . robustness threshold ρ . . . . . α (c) totally shattered itemsets Fig. 1 . Number of free, non-derivable, and totally shattered itemsets on
Zoo ( τ = 0 . ) dataset as a functionof α and ρ . In order to evaluate if this holds for more datasets, we computed the number offree/non-derivable/totally shattered using different α s and normalized this by thenumber of robust itemsets exceeding the minimal robustness threshold of . . In orderto minimize the variance of behavior of the robustness in a single dataset, we consider anaverage over all test datasets, which we give in Figure 2. We see the same phenomenonas in Figure 1. Large values of α induce a skewed distribution which becomes morebalanced as we decrease the value of α . Consider α = 0 . . Our test datasets typicallycontain a lot of itemsets having only one transaction keeping them from becomingnon-free. This can be seen as a dip of the curve for α = 0 . at ρ = 0 . in Figure 2(a). Asecond dip at ρ = 0 . represents the itemsets that can be made non-free by deletingtwo transactions. As we make α smaller, these dips become less prominent.Based on this we chose α = 0 . and plotted the number of free itemsets as a functionof ρ . Figure 3(a) shows that for the Zoo dataset there are many free itemsets with very
ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :22 . . . . . . . . . . . . . . . robustness threshold ρ i t e m s e t s ( p r o p o r t i o n ) (a) free itemsets . . . . . . . . . . . . . . . robustness threshold ρ (b) totally shattered itemsets . . . . . . . . . . . . . . robustness threshold ρ (c) non-derivable itemsets α = 0 . α = 0 . α = 0 . α = 0 . α = 0 . α = 0 . α = 0 . α = 0 . α = 0 . Fig. 2 . Average of the number of free/totally shattered/non-derivable itemsets as a function of ρ normalizedby the number of itemsets for ρ = 0 . . Average is taken over all test datasets different robustness values showing a rich structure that can be exploited to rank andreduce the number of itemsets. Similar results were observed for many of the UCIdatasets. Figure 3(b) shows a representative example for the text datasets. While thedistribution is much more skewed, a large ρ would also reduce the number of itemsetsby about 50%. Finally, Figure 3(c) shows an example for a large transactional datasetwith 88k transactions. Using α = 0 . generated a distribution where all values wereclose to one so we needed to set α = 0 . to better show the quantitative differencesof the itemsets. This demonstrates that the more transactions a dataset contains, themore skewed the distribution for a fixed α will be. . . . . · robustness threshold ρ i t e m s e t s (a) Zoo ( α = 0 . , τ = 0 . ) . . . . · robustness threshold ρ i t e m s e t s (b) LA12 ( α = 0 . , τ = 0 . ) . . . . · robustness threshold ρ i t e m s e t s (c) Retail ( α = 0 . , τ = 0 . ) Fig. 3 . Number of free itemsets as a function of ρ Our next experiment is to see how robust closed itemsets behave when a dataset isexposed to noise. Our expectation is that most robust itemsets will stay closed and beranked higher while the ranking of the less robust itemsets will be more susceptible tonoise.In order to do this, we created from each dataset a synthetic dataset having the samedimensions by sampling from a distribution. The underlying distribution had the samemargins as the original data but otherwise items were independent. We then mix theoriginal data with the synthetic one, that is, an entry in a mixed dataset is an entryfrom the synthetic dataset with the probability η , and is an entry from the original ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :23 dataset with the probability − η . We tested two different noise levels η = 0 . and η = 0 . .We mined approximately
10 000 frequent closed itemsets from each original dataset. Ifthe dataset contained less than
10 000 itemsets, we set the threshold to one transaction.Using the same thresholds we mined closed itemsets from the mixed datasets. Wesorted the itemsets using Algorithm 1.Let X be an itemset ranked i th in the original data. Assume that X is ranked j th inthe noisy data. We define compliance of X by / ( | i − j | + 1) . The compliance will be if i = j and decreases to the longer is the distance. The reason for using this particulardefinition is that we can naturally set compliance to if X is not found in the noisydata. The compliances for top- itemsets are given in Figure 4. . . . . . . . . . original itemset rank c o m p li a n c e (a) noise η = 0 . . . . . . . . . . original itemset rank c o m p li a n c e (b) noise η = 0 . Fig. 4 . Rank compliance of an itemset in a noisy data as a function of robustness in the original data. Highcompliance value imply that adding noise had little effect on the rank of an itemset. Median and quartilesare computed over all datasets.
From the figures we see that compliance stay high for robust itemsets and drop as wemove further down the original ranking. That is, the more robust an itemset is, the lessprone to noise it is. Adding more noise to the data implies less compliance. For example,for noise level η = 0 . , top-60 itemsets had a compliance of 0.25 or higher in half of thedatasets. This means that their rank changed only by . On the other hand for noiselevel η = 0 . , top-50 itemsets had a compliance of 0.1 or higher in half of the dataset, inother words, ranks changed by . α Our next experiment was to compare the parameter-free ranking described in Section 4against the rankings based on quantitative robustness given specific values of α . Weexpect that rankings are similar for large α values and difference increase when welower α . For comparison we used the number of discordant pairs to calculate a distanceof the rankings similar to Kendall’s τ . A discordant pair is a pair of itemsets ( X, Y ) such that the first method ranks X higher than Y and the second method ranks Y higher than X . We normalize the number of observed discordant pairs by b , where b is the maximum number of discordant pairs. Hence, we obtain a value between and . If there are no ties in robustness, then b = N ( N − / , where N is the numberof itemsets. However if ties are presented, that is, the robustness induces a bucketorder, then b = N ( N − / − (cid:80) i =1 B i ( B i − / , where B i is the size of each bucket, set ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :24 of itemsets having the same robustness. Values close to mean that rankings are inagreement.Typical examples are given in Table II for the Mushroom and
Zoo datasets, along withthe averages taken over all datasets. Surprisingly, the ranking distance is extremelysmall even for small values of α showing that the parameter free approach producesrankings similar to rankings under most α . Starting at α = 0 . for Mushroom and all α for Zoo only about 1% of pairs are discordant. We see that values increase as we lower α which is expected since the parameter-free approach is based on large α values. Table II. Distance between parameter-free rankings and rankings based on α for Mushroom and
Zoo datasets.Low values imply that rankings agree. Value range is – . Mushroom ( τ = 0 . Zoo ( τ = 0 . All datasets α free ts nd free ts nd free ts nd . .
30 0 .
82 0 .
39 14 .
91 7 .
26 7 .
62 4 .
35 4 .
25 2 . . .
078 0 .
27 0 .
089 10 .
01 8 .
69 4 .
31 2 .
59 2 .
67 2 . . .
017 0 .
11 0 .
044 5 .
94 5 .
40 2 .
23 1 .
46 1 .
63 1 . . . .
050 0 .
015 2 .
84 2 .
77 1 .
84 0 .
69 1 .
03 0 . . . .
027 0 . .
12 1 .
31 1 .
11 0 .
26 0 .
56 0 . . . .
016 0 . .
32 0 .
58 0 .
56 0 .
082 0 .
25 0 . . . . .
017 0 .
20 0 .
25 0 .
013 0 .
073 0 . . . .
013 0 . .
013 0 . . . . Closed itemsets are often used for tasks requiring interpretation of the itemsets, becauseas maximum elements of an equivalence class they offer the most detailed description.We studied the highest ranked closed itemsets for text datasets that are easily under-stood without domain knowledge. As an illustrative example, we used the re0 newsdataset from which we mine closed itemsets with minimum support τ = 0 . . Weordered these itemsets using the estimation technique given in Section 5.2 and listthe top 45 itemsets in Table III. The ranking is different from the one using support,less frequent (but more robust) itemsets are commonly ranked higher that frequentitemsets. For example, ’bank pct rate’ occurs before the much more frequent itemset’bank pct’ showing that ’bank pct’ is only closed in the full dataset due to relatively fewdocuments using it without also using ’rate’. Table III. Top-45 closed itemsets from re0 ( τ = 0 . ) dataset.
1. pct 792 16. week 310 31. canada 1172. bank 702 17. pct earlier 127 32. pct month 2613. trade 485 18. japan 318 33. econom 2954. billion 552 19. trade current 126 34. billion dlr mln 1165. market 554 20. dlr 472 35. told bank 1166. billion dlr 346 21. bank pct rate 287 36. told nation 1167. offici 342 22. dollar 336 37. pct japan 1158. mln 420 23. statem 122 38. pct adjust 1159. nation 323 24. committe 121 39. billion current 11510. rate 566 25. nation month 121 40. european 11411. bank market 369 26. ministri 120 41. month japan 11412. foreign 331 27. pct rise 269 42. bank ad market 11413. pct figur 132 28. bank pct 407 43. action 11414. pct rate 418 29. pct rate feb 119 44. trade world 11415. month 391 30. lead 118 45. nation japan 114
ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :25
Finally, we considered an alternative order by ranking itemsets based on how free theyare. Note that a closed itemset is robust if the same transactions cannot be explained bya superset whereas a free itemset is robust if the same transactions cannot be explainedby a subset. For example, a singleton X will be ranked higher than singleton Y if X has lower support. The reason for this is that it requires less transactions to be removedin order to make Y non-robust, namely the transactions not containing Y . We presentthe top-45 free non-singleton itemsets from re0 news dataset in Table IV. These arefrequent item pairs ab such that sp ( ab ) (cid:28) sp ( a ) and sp ( ab ) (cid:28) sp ( b ) , that is, a non-robustfree item pair ab would be such that if we would remove a singleton a (or b ), then roughlythe same transactions will still cover the pattern. An example of such non-robust freeitemset is bank assist . This itemset is ranked as 2 465 out of 2 558 itemsets. The supportof this itemset is 96 but the support of assist is 98, consequently there are only twodocuments in which assist occurs but not bank . Table IV. Top-45 free non-singleton itemsets from re0 ( τ = 0 . ) dataset. . billion rate
165 16 . bank billion
287 31 . govern dollar . rate dlr
132 17 . billion pct
288 32 . foreign februari . trade rate
146 18 . rise dlr
118 33 . januari dlr . trade bank
154 19 . pct dlr
210 34 . monei dlr . billion market
228 20 . trade billion
223 35 . dollar februari . rate mln
109 21 . bank dlr
211 36 . rise japan . trade pct
176 22 . govern februari
77 37 . februari japan . bank pct
407 23 . govern mln
90 38 . dollar offici . market rate
262 24 . month dlr
141 39 . rise offici . trade mln
130 25 . trade dlr
222 40 . pct mln . market dlr
186 26 . pct market
306 41 . februari dlr . month mln
106 27 . market rise
133 42 . foreign mln . trade market
203 28 . februari offici
83 43 . nation februari . rise mln
109 29 . govern monei
77 44 . januari mln . trade rise
115 30 . januari govern
77 45 . monei month
7. DISCUSSION
The experiments have shown that the number of itemsets can be largely reduced onmany datasets when requiring a certain robustness. The fact that the results vary bydataset are another indication of the well known fact that itemset data with differentstructures (dense vs. sparse, many items vs. many transactions) behave very differentlyin mining tasks.We believe that robust itemsets can be beneficial for post-processing techniques suchas [Bringmann and Zimmermann 2009] or [Vreeken et al. 2011] that use itemsetsas their input and remove redundancy in the pattern set. Robust itemsets can beused as an alternative input reducing their runtime without sacrificing performance.Also, robust itemsets could be used instead of closed-itemsets as seeds to the AC-Closealgorithm for approximate itemset mining [Cheng et al. 2006] improving its efficiencythat was criticized in [Gupta et al. 2008].The ranking of itemsets by robustness presents a new interestingness measure thatcan be used to choose the top- k itemsets for interpretation or other data mining tasks.The intuition of robustness should be easy to understand for analysts but which rankingis better for specific data mining tasks remains to be studied.In particular it will be interesting to evaluate performance as features for classifica-tion tasks in contrast to direct mining of prediction tasks. For interpretable classifiersone would want itemsets to be long, thus use closed patterns. On the other hand the ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :26 desire is for an itemset to be present in unseen data with high likelihood, so free item-set as the minimal elements of an equivalence class may generalize better. For bothpatterns we can ensure that they are present in many subsets of the training withoutactually sampling, potentially alleviating the need for nested cross validation.
8. SUMMARY
We have shown how robustness under subsampling for common classes of itemsets canbe computed efficiently without actually sampling the data. The experimental resultsshow that the number of reported itemsets can be largely reduced on many datasets, inother words spurious itemsets that would not have been found in many subsets of thedata are removed. The approach can further be used to rank itemsets for top- k miningby robustness. Future work will investigate the effect of using robust itemsets on datamining tasks such as clustering, classification, and rule generation using itemsets. REFERENCES A GRAWAL , R., I
MIELINSKI , T.,
AND S WAMI , A. N. 1993. Mining association rules between sets of items inlarge databases. In
SIGMOD . 207–216.A
SUNCION , A.
AND N EWMAN , D. 2007. UCI machine learning repository.B
OULICAUT , J.-F., B
YKOWSKI , A.,
AND R IGOTTI , C. 2000. Approximation of frequency queries by means offree-sets. In
PKDD . 75–85.B
OULICAUT , J.-F., B
YKOWSKI , A.,
AND R IGOTTI , C. 2003. Free-sets: A condensed representation of booleandata for the approximation of frequency queries.
DMKD 7,
1, 5–22.B
RIN , S., M
OTWANI , R.,
AND S ILVERSTEIN , C. 1997. Beyond market baskets: Generalizing association rulesto correlations. In
SIGMOD . 265–276.B
RINGMANN , B.
AND Z IMMERMANN , A. 2009. One in a million: picking the right patterns.
KAIS 18,
1, 61–81.C
ALDERS , T.
AND G OETHALS , B. 2007. Non-derivable itemset mining.
DMKD 14,
1, 171–206.C
ALDERS , T., G
OETHALS , B.,
AND M AMPAEY , M. 2007. Mining itemsets in the presence of missing values. In
SAC . 404–408.C
ALDERS , T., R
IGOTTI , C.,
AND B OULICAUT , J.-F. 2006. A survey on condensed representations for frequentsets. In
Constraint-Based Mining and Inductive Databases . 64–80.C
HENG , H., Y AN , X., H AN , J., AND H SU , C. 2007. Discriminative frequent pattern analysis for effectiveclassification. In ICDE . 716–725.C
HENG , H., Y U , P. S., AND H AN , J. 2006. AC-Close: Efficiently mining approximate closed itemsets by corepattern recovery. In ICDM . IEEE, 839–844.C
OENEN , F. 2003. The LUCS-KDD discretised/normalised ARM and CARM data library.D E B IE , T. 2011. Maximum entropy models and subjective interestingness: an application to tiles in binarydatabases. 1–40.G ALLO , A., D E B IE , T., AND C RISTIANINI , N. 2007. Mini: Mining informative non-redundant itemsets. In
ECMLPKDD . 438–445.G
EERTS , F., G
OETHALS , B.,
AND M IELIKÄINEN , T. 2004. Tiling databases. In
Proc. Discovery Science .278–289.G
HOSHAL , G.
AND B ARABÁSI , A.-L. 2011. Ranking stability and super-stable nodes in complex networks.
Nature Communications 2 .G IONIS , A., M
ANNILA , H., M
IELIKÄINEN , T.,
AND T SAPARAS , P. 2007. Assessing data mining results viaswap randomization.
TKDD 1,
OETHALS , B.
AND Z AKI , M. 2003. FIMI ’03, frequent itemset mining implementations. In
ICDM 2003Workshop, FIMI .G UPTA , R., F
ANG , G., F
IELD , B., S
TEINBACH , M.,
AND K UMAR , V. 2008. Quantitative evaluation ofapproximate frequent pattern mining algorithms. In
KDD . 301–309.H
ANHIJÄRVI , S., O
JALA , M., V
UOKKO , N., P
UOLAMÄKI , K., T
ATTI , N.,
AND M ANNILA , H. 2009. Tell mesomething I don’t know: randomization strategies for iterative data mining. In
Proceedings of the 15thACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009) . 379–388.H
IPP , J., G
ÜNTZER , U.,
AND N AKHAEIZADEH , G. 2000. Algorithms for association rule mining - a generalsurvey and comparison.
SIGKDD Explorations 2,
1, 58–64.
ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. :27 L UCCESE , C., O
RLANDO , S.,
AND P EREGO , R.C
ASAS -G ARRIGA , G. 2010. Mining top-k patterns from binarydatasets in presence of noise. In
ICDM .M IELIKÄINEN , T. 2005. Transaction databases, frequent itemsets, and their condensed representations. In
KDID . 139–164.M
ISRA , G., G
OLSHAN , B.,
AND T ERZI , E. 2012. A framework for evaluating the smoothness of data-miningresults. In
ECMLPKDD 2012 . 660–675.M
OERCHEN , F., T
HIES , M.,
AND U LTSCH , A. 2010. Efficient mining of all margin-closed itemsets withapplications in temporal knowledge discovery and classification by compression.
KAIS .P ASQUIER , N., B
ASTIDE , Y., T
AOUIL , R.,
AND L AKHAL , L. 1999. Discovering frequent closed itemsets forassociation rules. In
ICDT . 398–416.P EI , J., H AN , J., AND L AKSHMANAN , L. V. S. 2001. Mining frequent itemsets with convertible constraints.In
ICDE . 433–442.S
METS , K.
AND V REEKEN , J. 2011. The odd one out: Identifying and characterising anomalies. In
SDM .T ATTI , N. 2008. Maximum entropy based significance of itemsets.
KAIS 17,
1, 57–77.T
ATTI , N.
AND M OERCHEN , F. 2011. Finding robust itemsets under subsampling. In . 705–714.U NO , T. AND A RIMURA , H. 2007. An efficient polynomial delay algorithm for pseudo frequent itemset mining.In
Discovery Science . Springer, 219–230.V
REEKEN , J.,
VAN L EEUWEN , M.,
AND S IEBES , A. 2011. Krimp: mining itemsets that compress.
DMKD 23,
ANG , K., X U , C., AND L IU , B. 1999. Clustering transactions using large items. In CIKM . 483–490.W
EBB , G. I. 2007. Discovering significant patterns.
Mach. Learn. 68,
1, 1–33.X IN , D., H AN , J., Y AN , X., AND C HENG , H. 2005. Mining compressed frequent-pattern sets. In
VLDB .709–720.Z
HAO , Y.
AND K ARYPIS , G. 2002. Evaluation of hierarchical clustering algorithms for document datasets. In
CIKM . 515–524.. 515–524.