Model-based Exception Mining for Object-Relational Data
MModel-based Exception Mining for Object-RelationalData
Fatemeh Riahi
School of Computing ScienceSimon Fraser UniversityBurnaby, [email protected]
Oliver Schulte
School of Computing ScienceSimon Fraser UniversityBurnaby, [email protected]
Abstract —This paper is based on a previous publication [29].Our work extends exception mining and outlier detection to thecase of object-relational data. Object-relational data represent acomplex heterogeneous network [12], which comprises objectsof different types, links among these objects, also of differ-ent types, and attributes of these links. This special structureprohibits a direct vectorial data representation. We follow thewell-established Exceptional Model Mining framework, whichleverages machine learning models for exception mining: A objectis exceptional to the extent that a model learned for the objectdata differs from a model learned for the general population.Exceptional objects can be viewed as outliers. We apply state-of-the-art probabilistic modelling techniques for object-relationaldata that construct a graphical model (Bayesian network), whichcompactly represents probabilistic associations in the data. Anew metric, derived from the learned object-relational model,quantifies the extent to which the individual association pattern ofa potential outlier deviates from that of the whole population. Themetric is based on the likelihood ratio of two parameter vectors:One that represents the population associations, and another thatrepresents the individual associations. Our method is validatedon synthetic datasets and on real-world data sets about soccermatches and movies. Compared to baseline methods, our noveltransformed likelihood ratio achieved the best detection accuracyon all datasets.
I. I
NTRODUCTION : E
XCEPTION M INING FOR R ELATIONAL D ATA
Exception mining is an important data analysis task inmany domains. For relational data, exception mining supportsoutlier detection, where statistical deviations are viewed as dueto a node or entity being genuinely exceptional, rather thandue to statistical noise in the data. Statistical approaches tounsupervised exception/outlier detection are based on a gener-ative model of the data [2]. The generative model representsnormal behavior. An individual object is deemed an outlierif the model assigns sufficiently low likelihood to generatingit. Following the well-established Exceptional Model Miningframework [10], we propose a new method for extendingstatistical outlier detection to the case of object-relationaldata using a novel likelihood-ratio comparison for generativeprobabilistic models.The object-relational data model is one of the main datamodels for structured data [18]. The main characteristics ofobjects that we utilize in this paper are the following. (1)
Object Identity.
Each object has a unique identifier that is the same across contexts. For example, a player has a name thatidentifies him in different matches. (2)
Class Membership.
Anobject is an instance of a class, which is a collection of similarobjects. Objects in the same class share a set of attributes. Forexample, van Persie is a player object that belongs to the classstriker, which is a subclass of the class player. Note that thisuse of the term “class” is different from the machine learningsense of “class” as a prediction target. (3)
Object Relationships.
Objects are linked to other objects. Both objects and theirlinks have attributes. A common type of object relationshipis a component relationship between a complex object and itsparts. For example, a match links two teams, and each teamcomprises a set of players for that match. A difference betweenrelational and vectorial data is therefore that an individualobject is characterized not only by a list of attributes, but alsoby its links and by attributes of the object linked to it. We referto the substructure comprising this information as the objectdata . Equivalent terms are “egonet” from network analysis [3]and “interpretation” [19]. Relational outlier detection aims toidentify objects whose data differ from the general populationor class. Our approach to this problem leverages statistical-relational model discovery, as follows. a) Approach:
A class-model Bayesian network (BN)structure is learned with data for the entire population. Thenodes in the BN represent attributes for links, of multipletypes, and attributes of objects, also of multiple types. To learnthe BN model, we apply techniques from statistical-relationallearning, a recent field that combines AI and machine learning[13], [32], [9]. Given a set of parameter values and an inputdatabase, it is possible to compute a class model likelihood that quantifies how well the BN fits the object data. The classmodel likelihood uses BN parameter values estimated from theentire class data.
This is a relational extension of the standardlog-likelihood method for i.i.d. vectorial data, which uses thelikelihood of a data point as its outlier score. While the classmodel likelihood is a good baseline score, it can be improvedby comparing it to the object model likelihood , which uses BNparameter values estimated from the object data.
The modellog-likelihood ratio (LR) is the log-ratio of the object modellikelihood to the class model likelihood. This ratio quantifieshow the probabilistic associations that hold in the generalpopulation deviate from the associations in the object datasubstructure. While the likelihood ratio discriminates relationaloutliers better than the class model likelihood alone, it canbe improved further by applying two transformations: (1)a mutual information decomposition, and (2) replacing log- (cid:13) a r X i v : . [ c s . A I] J u l xtended from Riahi and Schulte 2015 IEEE Symposium Series on Computational Intelligence Exceptional Model Mining: I.I.D Single-Table Data attribute 1 attribute 2 attribute 3 population model learning Entire Population Data attribute 1 attribute 2 attribute 3 subgroup model learning Subgroup Data Outlierness Metric (quality measure) = Measures dissimilarity between population and subgroup models
Fig. 1. A general schema for Exceptional Model Mining for propositionaldata likelihood differences by log-likelihood distances. We refer tothe resulting novel score as the log-likelihood distance . b) Evaluation: Our code and datasets are available on-line at [28]. Our performance evaluation follows the design ofprevious outlier detection studies [12], [2], where the methodsare scored against a test set of known outliers. We use threesynthetic and two real-world datasets, from the UK PremierSoccer League and the Internet Movie Database (IMDb). Onthe synthetic data we have known ground truth. For the real-world data, we use a one-class design, where one object classis designated as normal and objects from outside the classare the outliers. For example, we compare goalies as outliersagainst the class of strikers as normal objects. On all datasets,the log-likelihood distance metric achieves the best detectionaccuracy compared to baseline methods.We also offer case studies where we assess whetherindividuals that our score ranks as highly unusual in theirclass are indeed unusual. The case studies illustrate that ouroutlier score is easy to interpret , because the Bayesian networkprovides a sum decomposition of the data distributions byfeatures. Interpretability is very important for users of anoutlier detection method as there is often no ground truth toevaluate outliers suggested by the method. c) Related Work:
Section V discusses the relationshipto related work in detail. Our approach applies the exceptionalmodel mining (EMM) framework [10] to multi-relational data.Figure 1 illustrates the EMM schema. The EMM frameworkleverages the extensive work on model learning in machinelearning for exception mining: A subgroup is exceptional tothe extent that a model learned from data for the subgroupdeviates from a model learned for the general population. Acomputational method for measuring this extent is called aquality measure; we also refer to it as an outlierness metric.For a given model type, finding an appropriate quality measurefor quantifying exceptionality is the main research question inEMM. The EMM framework allows us to leverage the exten-sive work on statistical-relational model learning for exceptionmining in multi-relational data. Compared to previous EMMmodels, the novelty of our work is as follows. 1) EMM hasso far been developed only for propositional i.i.d. data, notrelational data. Accordingly EMM has not been applied withSRL models. 2) In the propositional i.i.d. setting, each objectis represented by a single data row, and it is meaningless tolearn a model for a single object. Instead, EMM is appliedto identify exceptional subgroups of objects. With relational data, each object is represented by its own dataset (egonet,interpretation), and it is meaningful to apply EMM to identifysingle exceptional objects. Compared to previous relationaloutlier detection work, our model-based approach is novel inthat it neither summarizes the object data by a feature set (as inthe Oddball system, see [3]) nor looks for rules that exceptionalobjects violate (e.g. [19]). d) Contributions:
Our main contributions may be sum-marized as follows.1) The first EMM approach to outlier detection forstructured data that is based on a probabilistic model.2) A new outlier score based on a novel model likeli-hood comparison, the log-likelihood distance. e) Paper Organization:
We review background aboutBayesian networks for relational data. Then we describe howwe apply the EMM framework to multi-relational data. Weintroduce a novel log-likelihood distance outlier score as thequality or outlierness metric. After presenting the details ofour approach, we review related work. Empirical evaluationcompares model-based and aggregation-based approaches torelational outlier detection, with respect to three synthetic andthree real-world problems.II. B
ACKGROUND : B
AYESIAN N ETWORKS FOR R ELATIONAL D ATA
We adopt the Parametrized Bayes net (PBN) formalism[26] that combines Bayes nets with logical syntax for ex-pressing relational concepts. EMM is an inclusive frameworkand can in principle be applied with other SRL models,such as Markov Logic networks [9]. We worked with PBNsbecause i) they offer the most scalable structure learningmethods [33] to support our larger datasets, and ii) the PBNconditional probability parameters can be easily interpreted,which means that the resulting exceptionality metrics can beeasily interpreted (see Section I-0b below).
A. Bayesian Networks A Bayesian Network (BN) is a directed acyclic graph(DAG) whose nodes comprise a set of random variables[24]. Depending on context, we interchangeably refer to thenodes and variables of a BN. Fix a set of variables V = { V , . . . , V n } . The possible values of V i are enumerated as { v i , . . . , v ir i } . The notation P ( V i = v ) ≡ P ( v ) denotes theprobability of variable V i taking on value v . We also usethe vector notation P ( V = v ) ≡ P ( v ) to denote the jointprobability that each variable V i takes on value v i .The conditional probability parameters of a Bayesian net-work specify the distribution of a child node given an assign-ment of values to its parent node. For an assignment of valuesto its nodes, a BN defines the joint probability as the productof the conditional probability of the child node value given itsparent values, for each child node in the network. This meansthat the log-joint probability can be decomposed as the node-wise sum ln P ( V = v ; B, θ ) = n (cid:88) i =1 ln θ ( v i | v pa i ) (1)2xtended from Riahi and Schulte 2015 IEEE Symposium Series on Computational Intelligence ShotEff(T,M) PassEff(T,M) Result(T,M ) P(Result=Win|shotEff=high, passEff=high)=0.44 P(Result=Win|shotEff=high, passEff=low)=0.22 P(Result=Win|shotEff=low, passEff=low)=0.18 P(Result=Win|shotEff=low, passEff=high)=0.07 hipassEffhishotEffwinresP
P(shotEff=high)=0.38 P(shotEff=low)=0.62 P(passEff=high)=0.43 P(passEff=low)=0.57 winresP ShotEff(WA,M ) PassEff(WA,M) Result(WA,M ) P(Result=Win|shotEff=high, passEff=high)=0.53 P(Result=Win|shotEff=high, passEff=low)=0.50 P(Result=Win|shotEff=low, passEff=low)=0.00 P(Result=Win|shotEff=low, passEff=high)=0.11 winresP hipassEffhishotEffwinresP
P(shotEff=high)=0.50 P(shotEff=low)=0.50
P(passEff=high)=0.61 P(passEff=low)=0.39
Fig. 2. Example of joint and marginal probabilities computed from a toyBayesian network structure. The parameters were estimated from the PremierLeague dataset. (Top): A class model Bayesian network B c for all teamswith class parameters θ c . (Bottom): The same Bayesian network structurewith object parameters θ o learned for Wigan Athletics ( T = W A ). Ourmodel-based methods outlier scores compare the data likelihood of the classparameters and the object parameters. where v i resp. v pa i is the assignment of values to node V i resp. the parents of V i determined by the assignment v . Toavoid difficulties with ln(0) , here and below we assume thatjoint distributions are positive everywhere. Since the parametervalues for a Bayes net define a joint distribution over its nodes,they therefore entail a marginal, or unconditional, probabilityfor a single node. We denote the marginal probability thatnode V has value v as P ( V = v ; B, θ ) ≡ θ ( v ) . a) Example.: Figure 2 shows an example of a Bayesiannetwork and associated joint and marginal probabilities.
B. Relational Data
We adopt a functor-based notation for combining logicaland statistical concepts [26], [16]. A functor is a function or
Exceptional Model Mining: Multi-Relational Data statistical-relational model for population learning Entire Population Database statistical-relational model for individual object learning Individual Object Database Outlierness Metric (quality measure) = Measures dissimilarity between population and object models
Kullback-Leibler divergence Expected log-distance (new) Parametrized Bayesian Network Parametrized Bayesian Network
Fig. 3. The EMM approach for statistical-relational models. The model classwe utilize in this paper are Parametrized Bayesian networks, with a log-linearlikelihood function. As outlierness metrics we consider the standard Kullback-Leibler divergence, and a novel divergence introduced in this paper. predicate symbol. Each functor has a set of values (constants)called the domain of the functor. The domain of a predicate is { T , F } . Predicates are usually written with uppercase Romanletters, other terms with lowercase letters. A predicate of arityat least two is a relationship functor. Relationship functorsspecify which objects are linked. Other functors represent features or attributes of an object or a tuple of objects (i.e.,of a relationship). A population is a set of objects. A term isof the form f ( σ , . . . , σ k ) where f is a functor and each σ i isa first-order variable or a constant denoting an object. A termis ground if it contains no first-order variables; otherwise itis a first-order term. In the context of a statistical model, werefer to first-order terms as Parametrized Random Variables (PRVs) [16]. A grounding replaces each first-order variable ina term by a constant; the result is a ground term. A groundingmay be applied simultaneously to a set of terms. A relationaldatabase D specifies the values of all ground terms, which canbe listed in data tables.Consider a joint assignment P ( V = v ) of values to a setof PRVs V . The grounding space of the PRVs is the set ofall possible grounding substitutions, each applied to all PRVsin V . The count of groundings that satisfy the assignmentwith respect to a database D is denoted by D ( V = v ) .The database frequency P D ( V = v ) is the grounding countdivided by the number of all possible groundings. Example.
The Opta dataset represents information aboutpremier league data (Sec. VI-B). The basic populations areteams, players, matches, with corresponding first-order vari-ables T , P , M . Table I specifies values for some ground terms.The first three column headers show first-order variables rang-ing over different populations. The remaining columns repre-sent features. Table III illustrates grounding counts. Countsare based on the 2011-2012 Premier League Season. Wecount only groundings ( team , match ) such that team playsin match . Each team, including Wigan Athletics, appearsin 38 matches. The total number of team-match pairs is ×
20 = 760 .A novel aspect of our paper is that we learn model parame-ters for specific objects as well as for the entire population. Theappropriate object data table is formed from the populationdata table by restricting the relevant first-order variable to thetarget object. For example, the object database for target Team3xtended from Riahi and Schulte 2015 IEEE Symposium Series on Computational Intelligence
TABLE I. S
AMPLE P OPULATION D ATA T ABLE (S OCCER ). MatchId M TeamId T PlayerId P First goal(
P,M)
TimePlayed(
P,M)
ShotEff(
T,M) result(
T,M)
117 WA McCarthy 0 90 0.53 win
148 WA McCarthy 0 85 0.57 loss
15 MC Silva 1 90 0.59 win ... ... ... ... ... ...
TABLE II. S
AMPLE O BJECT D ATA T ABLE , FOR TEAM T = WA . MatchId M TeamId T = WA PlayerId P First goal(
P,M)
TimePlayed(
P,M)
ShotEff(
WA,M) result(
WA,M)
117 WA McCarthy 0 90 0.53 win
148 WA McCarthy 0 85 0.57 loss ... WA ... ... ... ...
TABLE III. E
XAMPLE OF G ROUNDING C OUNT AND F REQUENCY IN P REMIER L EAGUE D ATA , FOR THE CONJUNCTION passEff ( T , M ) = hi , shotEff ( T , M ) = hi , Result ( T , M ) = win . Database Count or D ( V = v ) Frequency or P D ( V = v ) Population 76 /
760 = 0 . Wigan Athletics 7 /
38 = 0 . WiganAthletic , forms a subtable of the data table of Table Ithat contains only rows where TeamID = WA ; see Table II.In database terminology, an object database is like a viewcentered on the object. The object database is an individual-centered representation [11]. C. Bayesian Networks for Relational Data A Parametrized Bayesian Network Structure (PBN) isa Bayesian network structure whose nodes are PRVs. Therelationships and features in an object database define a setof nodes for Bayes net learning; see Figure 2.
1) Model Likelihood for Parametrized Bayesian Networks:
A standard method for applying a generative model assumesthat the generative model represents normal behavior since itwas learned from the entire population. An object is deemedan outlier if the model assigns sufficiently low likelihoodto generating its features [6]. This likelihood method is animportant baseline for our investigation. Defining a likelihoodfor relational data is more complicated than for i.i.d. data,because an object is characterized not only by a feature vector,but by an object database. We employ the relational pseudolog-likelihood [31], which can be computed as follows for agiven Bayesian network and database.
LOG ( D , B , θ ) = n (cid:88) i = i (cid:88) j = (cid:88) pa i P D ( v ij , pa i ) ln θ ( v ij | pa i ) (2)Equation (2) represents the standard BN log-likelihoodfunction for the object data [8], except that parent-childinstantiation counts are standardized to be proportions [31].The equation can be read as follows.1) For each parent-child configuration, use the condi-tional probability of the child given the parent.2) Multiply the logarithm of the conditional probabilityby the database frequency of the parent-child config-uration.3) Sum this product over all parent-child configurationsand all nodes.Schulte proves that the maximum of the pseudo-likelihood(2) is given by the empirical database frequencies [31,Prop.3.1.]. In all our experiments we use these maximumlikelihood parameter estimates. Example.
The family configuration passEff ( T , M ) = hi , shotEff ( T , M ) = hi , Result ( T , M ) = win contributes one term to the pseudo log-likelihood for the BNof Figure 2. For the population database, this term is . × ln(0 .
44) = − . . For the Wigan Athletics database, the termis . × ln(0 .
44) = − . .III. EMM FOR R ELATIONAL D ATA
This section describes our approach to applying the EMMframework to relational data, using the following notation. • D C is the database for the entire class of objects; cf.Table I. This database defines the class distribution P C ≡ P D c . • D o is the restriction of the input database to the targetobject; cf. Table II. This database defines the objectdistribution P o ≡ P D o . • B C is a model (e.g., Bayesian network) learned with D P as the input database; cf. Figure 2(a). • θ C resp. θ o are parameters learned for B C using D c resp. D o as the input database.Figure 3 illustrates these concepts and the system flow forcomputing an outlierness score. First, we learn a Bayesiannetwork B C for the entire population using a previous learningalgorithm (see Section VI-C below). We then evaluate how wellthe class model fits the target object data. For vectorial data,the standard model fit metric is the log-likelihood of the target datapoint . For relational data, the counterpart is the relationallog-likelihood (2) of the target database : LOG ( D o , B C , θ C ) . (3)While this is a good baseline outlier score, it can beimproved by considering scores based on the likelihood ratio,or log-likelihood difference : LR ( D o , B C , θ o ) ≡ LOG ( D o , B C , θ o ) − LOG ( D o , B C , θ C ) . (4)The log-likelihood difference compares how well the class-level parameters fit the object data, vs. how well the objectparameters fit the object data. In terms of the conditional prob-ability parameters, it measures how much the log-conditionalprobabilities in the class distribution differ from those inthe object distribution. Note that this definition applies onlyfor relational data where an individual is characterized bya substructure rather than a “flat” feature vector. Assumingmaximum likelihood parameter estimation, LR is equivalentto the Kullback-Leibler divergence between the class-leveland object-level parameters [8]. While the LR score providesmore outlier information than the model log-likelihood, itcan be improved further by two transformations as follows.(1) Decompose the joint probability into a single-featurecomponent and a mutual information component. (2) Replacelog-likelihood differences by log-likelihood distances. Theresulting score is the log-likelihood distance ( ELD ), which4xtended from Riahi and Schulte 2015 IEEE Symposium Series on Computational Intelligenceis the main novel score we propose in this paper. Formallyit is defined as follows for each feature i . The total score isthe sum of feature-wise scores. Section IV below providesexample computations. ELD i = (cid:80) r i j = P o ( v ij ) (cid:12)(cid:12)(cid:12) ln θ o ( v ij ) θ C ( v ij ) (cid:12)(cid:12)(cid:12) + (cid:80) r i j =1 (cid:80) pa i P o ( v ij , pa i ) (cid:12)(cid:12)(cid:12) ln θ o ( v ij | pa i ) θ o ( v ij ) − ln θ C ( v ij | pa i ) θ C ( v ij ) (cid:12)(cid:12)(cid:12) . (5)The first sum is the single-feature component, where eachfeature is considered independently of all others. It computesthe expected log-distance with respect to the singe featurevalue probabilities between the object and the class models.The second ELD sum is the mutual information compo-nent , based on the mutual information among all features. Itcomputes the expected log-distance between the object and theclass models with respect to the mutual information of featurevalue assignments. Intuitively, the first sum measures how themodels differ if we treat each feature in isolation. The secondsum measures how the models differ in terms of how stronglyparent and child features are associated with each other.
A. Motivation
The motivation for the mutual information decompositionis two-fold.(1)
Interpretability , which is very important for outlier de-tection. The single-feature components are easy to interpretsince they involve no feature interactions. Each parent-childlocal factor is based on the average relevance of parent valuesfor predicting the value of the child node, where relevanceis measured by ln( θ ( v ij | pa i ) /θ ( v ij )) . This relevance termis basically the same as the widely used lift measure [35],therefore an intuitively meaningful quantity. The ELD scorecompares how relevant a given parent condition is in the objectdata with how relevant it is in the general class.(2)
Avoiding cancellations.
Each term in the log-likelihooddifference (4) decomposes into a relevance difference and amarginal difference: ln θ o ( v ij | pa i ) θ C ( v ij | pa i ) = ln θ o ( v ij | pa i ) θ o ( v ij ) − ln θ C ( v ij | pa i ) θ C ( v ij ) +ln θ o ( v ij ) θ C ( v ij ) . (6)These differences can have different signs for differentchild-parent configurations and cancel each other out; seeTable IV. Taking distances as in Equation 5 avoids this undesir-able cancellation. Since our goal is to assess the distinctnessof an object, we do not want differences to cancel out. Thegeneral point is that averaging differences is appropriate whenconsidering costs, or utilities, but not appropriate for assessingthe distinctness of an object. For instance, the average of bothvectors (0,0) and (1,-1) is 0, but their distance is not.
B. Comparison Outlier Scores
Our lesion study compares our log-likelihood distance
ELD score to baselines that are defined by omitting a compo-nent of
ELD . In this section we define these scores. The scoresincrease in sophistication in the sense that they apply more
TABLE IV. B
ASELINE O UTLIER S CORES FOR B AYESIAN N ETWORKS
Method Formula FD i (cid:80) ni =1 (cid:80) rij =1 P o ( v ij ) (cid:12)(cid:12)(cid:12) ln θo ( vij ) θC ( vij ) (cid:12)(cid:12)(cid:12) − LOG i − (cid:80) ni =1 (cid:80) rij =1 (cid:80) pa i P o ( v ij , pa i ) ln θ C ( v ij | pa i ) LR i (cid:80) rij =1 (cid:80) pa i P o ( v ij , pa i ) ln θo ( vij | pa i ) θC ( vij | pa i ) . | LR i | (cid:80) rij =1 (cid:80) pa i P o ( v ij , pa i ) | ln θo ( vij | pa i ) θC ( vij | pa i ) | . LR + i (cid:80) rij =1 P o ( v ij ) ln θo ( vij ) θC ( vij ) + (cid:80) rij =1 (cid:80) pa i P o ( v ij , pa i ) ln θo ( vij | pa i ) θo ( vij ) − ln θC ( vij | pa i ) θC ( vij ) . transformations of the log-likelihood ratio. More sophisticatedscores provide more information about outliers. Table IVdefines local feature scores; the total score is the sum offeature-wise scores. All metrics are defined such that a higherscore indicates a greater anomaly. The metrics are as follows.
Feature Divergence FD is the first component of the ELD score. It considers each feature independently (no featurecorrelations).
Log-Likelihood Score
LOG is the standard model-based out-lier detection score using data likelihood.
Log-Likelihood Difference LR is the log-likelihood differ-ence (4) between the class-level and object-level param-eters. Log-Likelihood Difference with absolute value | LR | replaces differences in LR by distances. Log-Likelihood Difference with decomposition LR + applies a mutual information decomposition to LR .IV. E XAMPLES
We provide three simple examples with only two featuresthat illustrate the computation of the outlier scores. They aredesigned so that outliers and normal objects are easy to distin-guish, and so that it is easy to trace the behavior of an outlierscore. The examples therefore serve as thought experimentsthat bring out the strengths and weaknesses of model-basedoutlier scores. Figure 4 describes the BN representation of theexamples. Table V provides the computation of the scores. Forintuition, we can think of a soccer setting, where each matchassigns a value to each attribute F i , i = 1 , for each player.Scores for the F feature are computed conditional on F = 1 .Expectation terms are computed first for F = 1 , then F = 0 .The single feature distributions are uniform, so the featurecomponent ELD is 0 for each node in both examples.The table illustrates the undesirable cancelling effects in LR . In the high correlation scenario 4(a), the outlier objecthas a lower probability than the normal class distributionof Match Result = given that Shot Efficiency = .Specifically, 0.5 vs. 0.9. The outlier object exhibits a higherprobability Match Result = than the normal class distribu-tion, conditional on Shot Efficiency = ; specifically, 0.5 vs.0.1. In line 1, column 2 of Table V the log-ratios ln(0 . / . and ln(0 . / . therefore have different signs. In the lowcorrelation scenario 4(b), the cancelling occurs in the sameway, but with the normal and outlier probabilities reversed.The cancelling effect is even stronger for attributes with morethan two possible values.5xtended from Riahi and Schulte 2015 IEEE Symposium Series on Computational Intelligence F1=Shot_Efficiency F2=Match_Result
P(F1=1)= % 50
P(F2=0|F1=0)= % 90 P(F2=1|F1=1)= % 90
Normal=Striker
P(F1=1)= % 50
P(F2=1)= % 50
Outlier=Mid Fielder
P(F1=1)= % 50 P(F1=1)= % 50 P(F1=1)= % 90
P(F2=0|F1=0)= % 90 P(F2=1|F1=1)= % 90
P(F1=1)= % 10 (a) (b) (c)
P(F2=1)= % 50
P(F2=0|F1=0)= % 90 P(F2=1|F1=1)= % 90
P(F2=0|F1=0)= % 90
P(F2=1|F1=1)= % 90 F1=Shot_Efficiency F2=Match_Result
Normal=Striker
F1=Tackle_ Efficiency
F2=Match_Result
F1=Tackle_ Efficiency F2=Match_Result
Normal=Striker
F1=Shots On Target F2=Match_Result F1=Shots On Target F2=Match_Result
Outlier=Mid Fielder
Outlier=Mid Fielder
Fig. 4. Illustrative Bayesian networks. The networks are not learned from data, but hand-constructed to be plausible for the soccer domain. (a) High Correlation:Normal individuals exhibit a strong association between their features, outliers no association. Both normals and outliers have a close to uniform distributionover single features. (b) Low Correlation: Normal individuals exhibit no association between their features, outliers have a strong association. Both normals andoutliers have a close to uniform distribution over single features. (c) Single Attributes: Both normal and outlier individuals exhibit a strong association betweentheir features. In normals, 90% of the time, feature 1 has value 0. For outliers, feature 1 has value 0 only 10% of the time.TABLE V. E
XAMPLE C OMPUTATION OF DIFFERENT OUTLIER SCORES . Score F Computation F | F Computation Result LR / . / .
5) = 0 1 / . / .
9) + 1 / . / . | LR | (no parents) / | ln(0 . / . − ln(0 . / . | +1 / | ln(0 . / . − ln(0 . / . | FD | ln(0 . / . | = 0 1 / | ln(0 . / . | + 1 / | ln(0 . / . | ELD .
79 + FD Table V(a): High Correlation Case, Figure 4(a).
Score F Computation F | F Computation Result LR / . / .
5) = 0 0 . · . . / .
5) + 0 . · . . / . | LR | (no parents) . · . | ln(0 . / . − ln(0 . / . | + 0 . · . | ln(0 . / . − ln(0 . / . | FD | ln(0 . / . | = 0 1 / | ln(0 . / . | + 1 / | ln(0 . / . | ELD . FD Table V(b): Low Correlation Case. Figure 4(b).
V. R
ELATED W ORK
Outlier detection is a densely researched field, for a surveyplease see [2]. Figure 5 provides a tree picture of whereour method is situated with respect to other outlier detectionmethods and other data models. Our method falls in thecategory of unsupervised statistical model-based approaches.To our knowledge, ours is the first model-based methodtailored for object-relational data. Like other model-basedapproaches, it detects global outliers.
Aggarwal [2] defines aglobal outlier to be a data point that notably deviates fromthe rest of the population. We review relevant approachesfrom different data models, the most common atomic objectmodel—where data is represented by vectors—and structureddata models. Akoglu et al. provide an excellent recent surveyof outlier detection in relational models [3]. a) Attribute Vector Data Model:
By far most work onoutlier detection considers atomic objects with flat featurevectors. This leads to an impedance mismatch: The requiredinput format for these outlier detection methods is a singledata matrix, not a structured dataset. For example, one cannotprovide a relational database as input. This mismatch is notsimply a question of choosing a file format, but insteadreflects a different underlying data model: complex objectswith both attributes and component objects vs. atomic objectswith attributes only. It is possible to “flatten” structured databy converting it to unstructured feature vectors, for instanceby using aggregate functions. We evaluated the aggregationapproach in this paper by applying three standard methods for
Outlier(Detec+on(out(of(paper(scope(object4rela+onal( unsupervised(supervised(Associa+on(Rules( density(based+distance(based+subspace+clustering+ contextual(outliers(a:ribute(vectors(data(matrix(Community(Discovery(
Novelty(Diagram( outlier(score(=(log-likelihood(distance( paper%topic% data(model( i.i.d.(“fla:ening”(aggrega+on(Outlier(score(=(log4likelihood(model4based(model4based(
Fig. 5. A tree structure for related work on outlier detection for structureddata. A path specifies an outlier detection problem, the leaves list majorapproaches to the problem. Approaches in italics appear in experiments. outlier detection.Work on atomic contextual outliers [34] is like ours inthat it considers the distinctness of a target individual froma reference class. A reference class is not specified for eachobject, but is constructed as part of outlier detection. Ourwork could be combined with a class discovery approach byproviding a score of how informative the inferred classes are. b) Structured Data Models:
We discuss related techniquesin three types of structured data models: SQL (relational),XML (hierarchical), and OLAP (multi-dimensional).For relational data, many outlier detection approaches aimto discover rules that represent the presence of anomalousassociations for an individual or the absence of normal associ-ations [19], [12]. The survey by [23] unifies within a generalrule search framework related tasks such as exception mining,which looks for associations that characterize unusual cases,subgroup mining, which looks for associations characterizingimportant subgroups, and contrast space mining, which looksfor differences between classes. Another rule-based approachuses Inductive Logic Programming techniques [4]. While localrules are informative, they are not based on a global statisticalmodel and do not provide a single outlier score for eachindividual.A latent variable approach in information networks ranks6xtended from Riahi and Schulte 2015 IEEE Symposium Series on Computational Intelligencepotential outliers in reference to the latent communities in-ferred by network analysis [12]. Our model aggregates infor-mation from entities and links of different types, but does notassume that different communities have been identified.Koh et al. [17] propose a method for hierarchical structuresrepresented in XML document trees. Their aim is to identifyfeature outliers, not class outliers as in our work. Also, they useaggregate functions to convert the object hierarchy into featurevectors. Their outlier score is based on local correlations, andthey do not construct a model.The multi-dimensional data model defines numeric mea-sures for a set of dimensions. The differences in the two datamodels mean that multi-dimensional outlier detection mod-els [30] do not carry over to object-relational outlier detection.(1) The object data model allows but does not require anynumeric measures. In our datasets, all features are discrete.Nor do we assume that it is possible to aggregate numericmeasures to summarize lower-level data at higher levels. (2)In scoring a potential outlier object, our method considersother objects both below and above the target object in thecomponent hierarchy. OLAP exploration methods consideronly cells below or at the same level as the target cell. Forexample, in scoring a player, our method would considerfeatures of the player’s team. Also, the
ELD outlier scoreof an object is not determined by the outlier scores of itscomponents, in contrast to the approach of Sarawagi et al. .(3) Our approach models a joint distribution over features,exploiting correlations among features. Most of the OLAP-based methods consider only a single numeric measure at atime, not a joint model.Statistical data cleaning methods are related to outlier de-tection, in that erroneous data may be detected as outliers (e.g.,[7]). Nonetheless, these data cleaning methods differ fromour work in several ways. 1) Although they often originatein the database community, they are usually developed onlyfor single-table propositional data, not relational data. (Anexception is the ERACER system [20].) 2) Our work assumesthat the data is (mainly) correct, and identifies exceptionalidentities for the given data. 3) Data cleaning methods focuson unusual values or tuples (e.g. a mistaken rating for a movieby a user), not exceptional subdatabases or egonets.VI. E
XPERIMENTAL D ESIGN
All the experiments were performed on a 64-bit Centos ma-chine with 4GB RAM and an Intel Core i5-480 M processor.The likelihood-based outlier scores were computed with SQLqueries using JDBC, JRE 1.7.0. and MySQL Server version5.5.34. We describe the datasets used in our experiments.
A. Synthetic Datasets
We generated three synthetic datasets with normal andoutlier players using the distributions represented in the threeBayesian networks of Figure 4. Each player participates in38 matches, similar to the real-world data. The main goal ofdesigning synthetic experiments is to test the methods on easyto detect outliers. Each match assigns a value to each feature F i , i = 1 , for each player. High Correlation
See Figure 4(a).
TABLE VI. O
UTLIER / NORMAL O BJECTS IN R EAL -W ORLD D ATASETS . Normal
Low Correlation
See Figure 4(b).
Single features
See Figure 4(c).We used the mlbench package in R to generate syntheticfeatures in matches, following these distributions for 240normal players and 40 outliers. We followed the real-worldOpta data in terms of number of normal and outlier individuals.The scores are used to rank all 280 players. B. Real-World Datasets
Data tables are prepared from Opta data [21] andIMDb [14]. Our datasets and code are available on-line [15]. a) Soccer Data:
The Opta data were released byManchester City. It lists box scores, that is, counts of all theball actions within each game by each player, for the 2011-2012 season. For each player in a match, our data set containseleven player features. For each team in a match, there are fivefeatures computed as player feature aggregates, as well as theteam formation and the result (win, tie, loss). There are two re-lationships,
Appears Player ( P , M ) , Appears Team ( T , M ) . b) IMDB Data: The Internet Movie Database (IMDB)is an on-line database of information related to films, televisionprograms, and video games. The IMDB website offers adataset containing information on cast, crew, titles, technicaldetails and biographies into a set of compressed text files.We preprocessed the data like [25] to obtain a database withseven tables: one for each population and one for the three re-lationships
Rated ( User , Movie ) , Directs ( Director , Movie ) ,and ActsIn ( Actor , Movie ) .In real-world data, there is no ground truth about whichobjects are outliers. To address this issue, we employ a one-class design: we learn a model for the class distribution, withdata from that class only. Then we rank all individuals fromthe normal class together with all objects from a contrast classtreated as outliers, to test whether an outlier score recognizesobjects from the contrast class as outliers. Table VI shows thenormal and contrast classes for three different datasets. In-classoutliers are possible, e.g. unusual strikers are still membersof the striker class. Our case studies describe a few in-classoutliers. In the soccer data, we considered only individualswho played more than 5 matches out of a maximum 38. C. Methods Compared
We compare two types of approaches, and within eachapproach several outlier detection methods. The first approachevaluates the likelihood-based outlier scores described in Sec-tion III. For relational Bayesian network structure learning weutilize the previous learn-and-join algorithm (LAJ), which isa state-of-the-art BN structure learning method for relationaldata [32]. The LAJ algorithm employs an iterative deepeningstrategy, which can be described as as search through a latticeof table joins. For each table join, different BNs are learned7xtended from Riahi and Schulte 2015 IEEE Symposium Series on Computational Intelligenceand the learned edges are propagated from smaller to largertable joins. For a full description, complexity analysis, andlearning time measurements, please see [32]. We used theimplementation of the LAJ algorithm due to its creators [15].The second approach first “flattens” the structured datainto a matrix of feature vectors, then applies standard matrix-based outlier detection methods. We refer to such methods as aggregation-based (cf. Figures 5). For example, this was theapproach taken by Breunig et al. for identifying anomalousplayers in sports data [5]. Following their paper, for eachcontinuous feature in the object data, we use the average overits values, and for each discrete feature, we use the occurrencecount of each feature value in the object data. Aggregationtends to lose information about correlations. Our experimentsaddress the empirical question of whether this loss of in-formation affects outlier detection. We evaluated three stan-dard matrix-based outlier detection methods: Density-based
LOF [5], distance-based
KNNOutlier [27] and subspaceanalysis
OutRank [22]. These represent common, fundamentalapproaches for vectorial data. Like
ELD , subspace analysis issensitive to correlations among features. We used the availableimplementation of all three data matrix methods from the stateof the art data mining software
ELKI [1]. We used
PRO-CLUS as the clustering function for
OutRank , recommended by [22].VII. E
MPIRICAL R ESULTS
We present results regarding computational feasibility, pre-dictive performance, and case studies.
1) Computational Cost of the
ELD
Score.:
Table VIIshows that the computation of the
ELD value for a giventarget object is feasible. On average, it takes a quarter of aminute for each soccer player, and one minute for each movie.This includes the time for parameter learning from the objectdatabase. Learning the class model BN takes longer, but needsto be done only once for the entire object class.
The BNmodel provides a crucial low-dimensional representation ofthe distribution information in the data.
Table VIII comparesthe number of terms required to compute the
ELD score inthe BN representation to the number of terms in an unfactoredrepresentation with one parameter for each joint probability.
TABLE VII. T
IME ( MIN ) FOR COMPUTING THE
ELD
SCORE . Dataset Class Model Average per ObjectStrikers vs. Goalies 4.14 0.25Midfielder vs. Goalies 4.02 0.25Drama vs. Comedy 8.30 1.00
TABLE VIII. T HE B AYESIAN NETWORK REPRESENTATION DECREASESTHE NUMBER OF TERMS REQUIRED FOR COMPUTING THE
ELD
SCORE . Dataset
2) Detection Accuracy:
We follow the evaluation designof [12] and make each baseline methods detect the same per-centage of objects as outliers: Sort the outlier scores obtainedby the three baseline methods in descending order, and take thetop r percent as outliers. Then we use precision , a.k.a. true TABLE IX. P
RECISION OF OUTLIER SCORES IN DIFFERENT DATASETS . Dataset percentage Model-based models Aggregation-based models
ELD | LR | LR FD LOG LOF
OutRank
KNNOutlier
High-Correlation 1%
TABLE X.
AUC OF ELD VS . | LR | . Score High-Cor. Low-Cor. Single-F. Striker Midfielder Drama
ELD | LR | positive rate as the evaluation metric which is the percentageof correct ones in the set of outliers identified by the algorithm.As in [12], we set the percentages of outlier to be 1% and5%. In the one-class design, precision measures how manymembers of the outlier class were correctly recognized. Wealso report some AUC measurements [2], which aggregateprecision values at different percentage cutoffs. a) Likelihood-Based Methods: Table IX shows the
Precision values for each probabilistic ranking. Our
ELD score achieves the top score on each dataset. On the syntheticdata,
ELD and | LR | are the only scores with 100% precisionat 1% and 5%. This confirms the value of using distancesrather than differences. While it ought to be easy to distinguishthe outliers, Table X shows that ELD is the only score thatachieves perfect detection , that is AUC = 1.0. b) Aggregation-Based Methods vs. ELD:
Table IXshows the precision values for aggregation-based meth-ods compared to
ELD . Our
ELD score outperforms allaggregation-based methods on all datasets , except for a tiewith
OutRank (ProClus) on the relatively easy problem ofdistinguishing strikers from goalies. The performances ofaggregation-based methods are most like that of the probabilis-tic score FD , which does not consider the correlation amongthe features. This finding reflects the fact that aggregation tendsto lose information about correlations. The aggregation-basedmethods achieve their highest performance on the Strikers vs.Goalies dataset. In this dataset action count features such asShotsOnTarget, ShotEfficiency point to strikers and the featureSavesMade points to goalies. Therefore, outliers in this datasetare easy to find by considering features in isolation.
3) Case Studies:
For a case study, we examine three topoutliers as ranked by
ELD , shown in Table XI. The aimof the case study is to provide a qualitative sense of theoutliers indicated by the scores. Also, we illustrate how the BNrepresentation leads to an interpretable ranking. Specifically,we employ a feature-wise decomposition of the score combinedwith a drill down analysis:1) Find the node V i that has the highest ELD i diver-gence score for the outlier object.2) Find the parent-child combination that contributes themost to the ELD i score for that node. Our
ELD score performs the best also with other metrics such as recall,to a similar degree.
ELD score for the parent-child com-bination into feature and mutual information compo-nent.We present strong associations—indicated by the
ELD ’smutual information component—in the intuitive format ofassociation rules. a) Strikers vs. Goalies:
In real-world data, a rare objectmay be a within-class outlier , i.e., highly anomalous evenwithin its class. In an unsupervised setting without class labels,we do not expect an outlier score to distinguish such an in-classoutlier from outliers outside the class. An example is the strikerEdin Dzeko. He is a highly anomalous striker who obtains thetop
ELD divergence score among both strikers and goalies.His
ELD score is highest for the Dribble Efficiency feature.The highest
ELD score for that feature occurs when DribbleEfficiency is low, and its parents have the following values:Shot Efficiency high, Tackle Efficiency medium. Looking atthe single feature divergence, we see that Edin Dzeko is indeedan outlier in the Dribble Efficiency subspace: His dribbleefficiency is low in 16% of his matches, whereas a randomlyselected striker has low dribble efficiency in 50% of theirmatches. Thus, Edin Dzeko is an unusually good dribbler.Looking at the mutual information component of
ELD , i.e.,the parent-child correlations, for Edin Dzeko the confidence ofthe rule
ShotEff = high , TackleEff = medium → DribbleEff = low is 50%, whereas in the general striker class it is . b) Midfielders vs. Strikers: For the single feature score,Robin van Persie is recognized as a clear striker becauseof the
ShotsOnTarget feature. It makes sense that strikersshoot on target more often than midfielders. Robin van Persieachieves a high number of shots on targets in of hismatches, compared to for a random midfielder. The mutualinformation component shows that he also exhibits unusualcorrelations. For example, the confidence of the rule ShotEff = high , TimePlayed = high → ShotsOnTarget = high is 70% for van Persie, whereas for strikers overall it is 52%.The most anomalous midfielder is Scott Sinclair. His mostunusual feature is DribbleEfficiency : For feature divergence,he achieves a high dribble efficiency of the time, com-pared to a random midfielder with . c) Drama vs. Comedy: The top outlier rank is assignedto the within-class outlier
BraveHeart . Its most unusual fea-ture is
ActorQuality : In a random drama movie, of actorshave the highest quality level 4, whereas for
BraveHeart of actors achieve the highest quality level.The
ELD score identifies the comedies
BluesBrothers and
AustinPowers as the top out-of-class outliers. In a randomdrama movie, of actors have casting position 3, whereasfor
AustinPowers of actors have this casting position,and for
BluesBrothers of actors do.VIII. C
ONCLUSION
We presented a new approach for applying Bayes nets toobject-relational outlier detection, a challenging and practically important topic for machine learning. This approach followsthe general framework of Exceptional Model Mining [10], andapplies it to multi-relational data. The key idea is to learn oneset of parameter values that represent class-level associations,another set to represent object-level associations, and comparehow well each parametrization fits the relational data that char-acterize the target object. The classic metric for comparing twoparametrized models is their log-likelihood ratio; we refinedthis concept to define a new relational log-likelihood distancemetric via two transformations: (1) a mutual informationdecomposition, and (2) replacing log-likelihood differences bylog-likelihood distances. This metric combines a single featurecomponent, where features are treated as independent, witha correlation component that measures the deviation in thefeatures’ mutual information.In experiments on three synthetic and three real-worldoutlier sets, the log-likelihood distance achieved the bestdetection accuracy. The alternative of converting the structureddata to a flat data matrix via aggregation had a negative impact.Case studies showed that the log-distance score leads to easilyinterpreted rankings.There are several avenues for future work. (i) A limitationof our current approach is that it ranks potential outliers, butdoes not set a threshold for a binary identification of outliervs. non-outlier. (ii) Our divergence uses expected L1-distancefor interpretability, but other distance scores like L2 could beinvestigated as well. (iii) Extending the expected L1-distancefor continuous features is a useful addition.In sum, outlier metrics based on model likelihoods are anew type of structured outlier score for object-relational data.Our evaluation indicates that this model-based score providesinformative, interpretable, and accurate rankings of objects aspotential outliers. A
CKNOWLEDGEMENT
This work was supported by a Discovery Grant from theNational Sciences and Engineering Council of Canada. Weare indebted to Peter Flach for referring us to the EMMframework. R
EFERENCES[1] E. Achtert, H. Kriegel, E. Schubert, and A. Zimek. Interactive datamining with 3d-parallel coordinate trees. In
Proceedings of the 2013ACM SIGMOD , New York, NY, USA, 2013.[2] C. Aggarwal.
Outlier Analysis . Springer New York, 2013.[3] L. Akoglu, H. Tong, and D. Koutra. Graph based anomaly detectionand description: a survey.
Data Mining and Knowledge Discovery ,29(3):626–688, 2015.[4] F. Angiulli, G. Greco, and L. Palopoli. Outlier detection by logicprogramming.
ACM Transactions on Computer Logic , 2004.[5] M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. Lof: Identifyingdensity-based local outliers. In
Proceedings of ACM SIGMOD , 2000.[6] A. Cansado and A. Soto. Unsupervised anomaly detection in largedatabases using Bayes Nets.
Appllied Artificial Intelligence , 2008.[7] S. De, Y. Hu, V. V. Meduri, Y. Chen, and S. Kambhampati. Bayeswipe:A scalable probabilistic framework for improving data quality.
Journalof Data and Information Quality (JDIQ) , 8(1):5, 2016.[8] L. de Campos. A scoring function for learning Bayes nets based onmutual information and conditional independence tests.
Journal ofMachine learning Research , 2006.
TABLE XI. C
ASE STUDY FOR THE TOP OUTLIERS RETURNED BY THE LOG - LIKELIHOOD DISTANCE SCORE
ELD
Strikers (Normal) vs. Goalies (Outlier)PlayerName Position
ELD
Rank
ELD
Max Node
ELD
Node Score FD Max feature Value Object Probability Class ProbabilityEdin Dzeko Striker 1 DribbleEfficiency 83.84 DE=low 0.16 0.5Paul Robinson Goalie 2 SavesMade 49.4 SM=Medium 0.3 0.04Michel Vorm Goalie 3 SavesMade 85.9 SM=Medium 0.37 0.04Midfielders (Normal) vs. Strikers (Outlier)PlayerName Position
ELD
Rank
ELD
Max Node
ELD
Node Score FD Max feature Value Object Probability Class ProbabilityRobin Van Persie Striker 1 ShotsOnTarget 153.18 ST=high 0.34 0.03Wayne Rooney Striker 2 ShotsOnTarget 113.14 ST=high 0.26 0.03Scott Sinclair Midfielder 6 DribbleEfficiency 71.9 DE=high 0.5 0.3Drama (Normal) vs. Comedy (Outlier)MovieTitle Genre
ELD
Rank
ELD
Max Node
ELD
Node Score FD Max feature Value Object Probability Class ProbabilityBrave Heart Drama 1 ActorQuality 89995.4 a quality=4 0.93 0.42Austin Powers Comedy 2 Cast Position 61021.28 Cast Num=3 0.78 0.49Blue Brothers Comedy 3 Cast Position 24432.21 Cast num=3 0.88 0.49 [9] P. Domingos and D. Lowd.
Markov Logic: An Interface Layer forArtificial Intelligence . Morgan and Claypool Publishers, 2009.[10] W. Duivesteijn, A. J. Feelders, and A. Knobbe. Exceptional modelmining.
Data Mining and Knowledge Discovery , 30(1):47–98, 2016.[11] P. A. Flach. Knowledge representation for inductive learning. In
Symbolic and Quantitative Approaches to Reasoning and Uncertainty ,pages 160–167. Springer, 1999.[12] J. Gao, F. Liang, W. Fan, Y. Wang, and J. Han. On community outliersand their detection in information network. In
Proceedings of ACMSIGKDD , 2010.[13] L. Getoor and B. Taskar.
Introduction to statistical relational learning ∼ oschulte/jbn/.[16] A. Kimmig, L. Mihalkova, and L. Getoor. Lifted graphical models: asurvey. Computing Research Repository , 2014.[17] J. L. Koh, M. L. Lee, W. Hsu, and W. T. Ang. Correlation-based attributeoutlier detection in XML. In
Proceedings of ICDE. IEEE 24th , 2008.[18] D. Koller and A. Pfeffer. Object-oriented Bayes nets. In
Proceedingsof UAI , 1997.[19] J. Maervoet, C. Vens, G. Vanden Berghe, H. Blockeel, and P. De Caus-maecker. Outlier detection in relational data: A case study.
ExpertSystem Applications , 2012.[20] C. Mayfield, J. Neville, and S. Prabhakar. Eracer: a database approachfor statistical inference and data cleaning. In
Proceedings of the 2010ACM SIGMOD International Conference on Management of data
Proceedings ofICDM , 2012.[23] P. K. Novak, G. I. Webb, and S. Wrobel. Supervised descriptive rulediscovery: A unifying survey of contrast set, emerging pattern andsubgroup mining.
Journal of Machine Learning Research , 2009.[24] J. Pearl.
Probabilistic Reasoning in Intelligent Systems . MorganKaufmann, 1988.[25] V. Peralta. Extraction and Integration of MovieLens and IMDb.Technical report, APDM project, 2007.[26] D. Poole. First-order probabilistic inference. In
Proceedings of IJCAI ,2003.[27] S. Ramaswamy, R. Rastogi, and K. Shim. Efficient algorithms formining outliers from large data sets.
SIGMOD , 2000.[28] F. Riahi and O. Schulte. Codes and Datasets. [Online]. Available:.ftp://ftp.fas.sfu.ca/pub/cs/oschulte/CodesAndDatasets/, 2015.[29] F. Riahi and O. Schulte. Model-based outlier detection for object-relational data. In , pages 1590–1598. IEEE, 2015.[30] S. Sarawagi, R. Agrawal, and N. Megiddo. Discovery-driven explorationof OLAP data cubes. In
Proceedings of International Conference onExtending Database Technology . Springer-Verlag, 1998. [31] O. Schulte. A tractable pseudo-likelihood function for Bayes netsapplied to relational data. In
Proceedings of SIAM SDM , 2011.[32] O. Schulte and H. Khosravi. Learning graphical models for relationaldata via lattice search.
Journal of Machine Learning , 2012.[33] O. Schulte, H. Khosravi, and T. Man. Learning directed relationalmodels with recursive dependencies.
Machine Learning , 89:299–316,2012.[34] G. Tang, J. Bailey, J. Pei, and G. Dong. Mining multidimensionalcontextual outliers from categorical relational data. In
Proceedings ofSSDBM , 2013.[35] S. Tuffery.
Data Mining and Statistics for Decision Making . WileySeries in Computational Statistics, 2011.. WileySeries in Computational Statistics, 2011.