An Efficient Diagnosis Algorithm for Inconsistent Constraint Sets
AAn Efficient Diagnosis Algorithm forInconsistent Constraint Sets Alexander Felfernig and Monika Schubert and Christoph Zehentner Abstract.
Constraint sets can become inconsistent in different con-texts. For example, during a configuration session the set of customerrequirements can become inconsistent with the configuration knowl-edge base. Another example is the engineering phase of a configu-ration knowledge base where the underlying constraints can becomeinconsistent with a set of test cases. In such situations we are in theneed of techniques that support the identification of minimal sets offaulty constraints that have to be deleted in order to restore consis-tency. In this paper we introduce a divide-and-conquer based diag-nosis algorithm (F
AST D IAG ) which identifies minimal sets of faultyconstraints in an over-constrained problem. This algorithm is specifi-cally applicable in scenarios where the efficient identification of lead-ing (preferred) diagnoses is crucial. We compare the performance ofF
AST D IAG with the conflict-directed calculation of hitting sets andpresent an in-depth performance analysis that shows the advantagesof our approach.
Keywords : Interactive Configuration, Preferred Diagnoses, DirectDiagnosis, Model-based Diagnosis, Inconsistent Constraint Sets.
Constraint technologies [27] are applied in different areas such asconfiguration [22, 14, 26], recommendation [11], and scheduling [3].There are many scenarios where the underlying constraint sets canbecome over-constrained. For example, when implementing a con-figuration knowledge base, constraints can become inconsistent witha set of test cases [10]. Alternatively, when interacting with a con-figurator application [23, 11], the given set of customer requirements(represented as constraints) can become inconsistent with the config-uration knowledge base. In both situations there is a need of an in-telligent assistance that actively supports users of a constraint-basedapplication (end users or knowledge engineers). A wide-spread ap-proach to support users in the identification of minimal sets of faultyconstraints is to combine conflict detection (see, e.g., [18]) with acorresponding hitting set algorithm [7, 24, 6]. In their original formthese algorithms are applied for the calculation of minimal ( cardi-nality ) diagnoses which are typically determined with breadth-firstsearch. Further diagnosis algorithms have been developed that fol-low a best-first search regime where the expansion of the hitting setsearch tree is guided by failure probabilities of components [5]. An-other example for such an approach is presented in [11] where simi-larity metrics are used to guide the (best-first) search for a preferred Preprint of: A. Felfernig, M. Schubert, and C. Zehentner. An Efficient Diag-nosis Algorithm for Inconsistent Constraint Sets. Artificial Intelligence forEngineering Design, Analysis, and Manufacturing (AIEDAM), CambridgeUniversity Press, vol. 26, no.1, pp. 53-62, 2012. TU Graz, Institute of Software Technology, Applied Software Engineering& AI, Austria, email: {felfernig, schubert, zehentner}@ist.tugraz.at (plausible) minimal diagnosis (including repairs).Both, simple breadth-first search and best-first search diagnosisapproaches are predominantly relying on the calculation of conflictsets [18]. In this context, the determination of a minimal diagnosisof cardinality n requires the identification of at least n minimal con-flict sets. In this paper, we introduce a direct diagnosis algorithm(F AST D IAG ) that allows to determine one minimal diagnosis at atime with the same computational effort needed for determining oneconflict set at a time . F
AST D IAG supports the identification of pre-ferred diagnoses given predefined preferences regarding a set of deci-sion alternatives. It boosts the applicability of diagnosis methods inscenarios such as online configuration & reconfiguration [10], rec-ommendation of products & services [11], and (more generally) inscenarios where the efficient calculation of preferred (leading) diag-noses is crucial [5]. F
AST D IAG is not restricted to constraint-basedsystems but it is also applicable, for example, in the context of SATsolving [21] and description logics reasoning [15].The remainder of this paper is organized as follows. In Section2 we introduce a simple example configuration task from the auto-motive domain. In Section 3 we discuss the basic hitting set basedapproach to the calculation of diagnoses. In Section 4 we introducean algorithm (F
AST D IAG ) for calculating preferred diagnoses for agiven over-constrained problem. In Section 5 we present a detailedevaluation of F
AST D IAG which clearly outperforms standard hittingset based algorithms in the calculation of the topmost - n preferred di-agnoses. With Section 6 we provide an overview of related work inthe field. The paper is concluded with Section 7. Car configuration will serve as a working example throughout thispaper. Since we exploit configuration problems for the discussion ofour diagnosis algorithm, we first introduce a formal definition of aconfiguration task. This definition is based on [10] but is given in thecontext of a constraint satisfaction problem (CSP) [27].
Definition 1 (Configuration Task) . A configuration task can bedefined as a CSP (V, D, C). V = {v , v , . . . , v n } represents a set offinite domain variables. D = {dom(v ), dom(v ), . . . , dom(v n )} rep-resents a set of variable domains dom(v k ) where dom(v k ) representsthe domain of variable v k . C = C KB ∪ C R where C KB = {c , c , . . . ,c q } is a set of domain specific constraints (the configuration knowl-edge base) that restrict the possible combinations of values assignedto the variables in V. C R = {c q +1 , c q +2 , . . . , c t } is a set of customerrequirements also represented as constraints.A simplified example of a configuration task in the automotive do-main is the following. In this example, type represents the car type, pdc is the parc distance control functionality, fuel represents the fuel a r X i v : . [ c s . A I] F e b onsumption per 100 kilometers, a skibag allows ski stowage in-side the car, and represents the corresponding actuation type.These variables describe the potential set of requirements that can bespecified by the user (customer). The possible combinations of theserequirements are defined by a set of constraints which are denotedas configuration knowledge base (C KB ) which is defined as C KB ={c , c , c , c } in our example. Furthermore, we assume the set of customer requirements C R = {c , c , c }. • V = { type , pdc , fuel , skibag , } • D = {dom( type )={ city , limo , combi , xdrive }, dom( pdc )= { yes , no }, dom( fuel ) = { , , }, dom( skibag )={ yes , no }, dom( )={ yes , no } • C KB = { c : = yes ⇒ type = xdrive , c : skibag = yes ⇒ type (cid:54) = city , c : fuel = ⇒ type = city , c : fuel = ⇒ type (cid:54) = xdrive } • C R = { c : type = combi , c : fuel = , c : = yes}On the basis of this configuration task definition, we can now intro-duce the definition of a concrete configuration (solution for a config-uration task). Definition 2 (Configuration) . A configuration for a given config-uration task (V, D, C) is an instantiation I = {v =ins , v =ins , . . . ,v n =ins n } where ins k ∈ dom(v k ).A configuration is consistent if the assignments in I are consistentwith the c i ∈ C. Furthermore, a configuration is complete if all vari-ables in V are instantiated. Finally, a configuration is valid if it isconsistent and complete.
For the configuration task introduced in Section 2 we are not ableto find a solution, for example, a combi -type car does not support afuel consumption of
4l per 100 kilometers . Consequently, we wantto identify minimal sets of constraints (c i ∈ C R ) which have to bedeleted in order to be able to identify a solution (restore the consis-tency). In the example of Section 2 the set of constraints C R ={c ,c , c } is inconsistent with the constraints C KB = {c , c , c , c },i.e., no solution can be found for the underlying configuration task.A standard approach to determine a minimal set of constraints thathave to be deleted from an over-constrained problem is to resolve allminimal conflicts contained in the constraint set. The determinationof such constraints is based on a conflict detection algorithm (see,e.g., [18]), the derivation of the corresponding diagnoses is based onthe calculation of hitting sets [24]. Since both, the notion of a ( min-imal ) conflict and the notion of a ( minimal ) diagnosis will be usedin the following sections, we provide the corresponding definitionshere. Definition 3 (Conflict Set) . A conflict set is a set CS ⊆ C R s.t.C KB ∪ CS is inconsistent. CS is a minimal if there does not exist aconflict set CS’ with CS’ ⊂ CS.In our working example we can identify three minimal conflictsets which are CS ={c ,c }, CS ={c ,c }, and CS ={c ,c }.CS , CS , CS are conflict sets since CS ∪ C KB ∨ CS ∪ C KB ∨ CS ∪ C KB is inconsistent. The minimality property is fulfilledsince there does not exist a conflict set CS with CS ⊂ CS or CS ⊂ CS or CS ⊂ CS . The standard approach to resolve the givenconflicts is the construction of a corresponding hitting set directedacyclic graph ( HSDAG ) [24] where the resolution of all minimal con-flict sets automatically corresponds to the identification of a minimal Note that constraints are not necessarily unary or binary (we tried to keepthe example simple), they can also be n-ary . diagnosis. A minimal diagnosis in our application context is a mini-mal set of customer requirements contained in the set of car features(C R ) that has to be deleted from C R in order to make the remain-ing constraints consistent with C KB . Since we are dealing with thediagnosis of customer requirements, we introduce the definition of a customer requirements diagnosis problem (Definition 4). This defi-nition is based on the definition given in [10]. Definition 4 (CR Diagnosis Problem) . A customer requirementsdiagnosis (CR diagnosis) problem is defined as a tuple (C KB , C R )where C R is the set of given customer requirements and C KB repre-sents the constraints part of the configuration knowledge base.The definition of a CR diagnosis that corresponds to a given CRDiagnosis Problem is the following (see Definition 5).
Definition 5 (CR Diagnosis) . A CR diagnosis for a CR diagnosisproblem (C KB , C R ) is a set ∆ ⊆ C R , s.t., C KB ∪ (C R - ∆ ) is con-sistent. ∆ is minimal if there does not exist a diagnosis ∆ ’ ⊂ ∆ s.t.C KB ∪ (C R - ∆ ’) is consistent.The HSDAG algorithm for determining minimal diagnoses is dis-cussed in detail in [24]. The concept of this algorithm will be ex-plained on the basis of our working example. It relies on a conflictdetection algorithm that is responsible for detecting minimal con-flicts in a given set of constraints (in our case in the given customerrequirements). One conflict detection algorithm is Q UICK X PLAIN [18] which is based on an efficient divide-and-conquer search strat-egy. For the purposes of our working example let us assume that thefirst minimal conflict set determined by Q
UICK X PLAIN is the setCS = {c , c }. Due to the minimality property, we are able to re-solve each conflict by simply deleting one element from the set, forexample, in the case of CS we have to either delete c or c . Eachvariant to resolve a conflict set is represented by a specific path inthe corresponding HSDAG – the HSDAG for our working exampleis depicted in Figure 1. The deletion of c from CS triggers thecalculation of another conflict set CS = {c , c } since C R - {c } ∪ C KB is inconsistent. If we decide to delete c from CS , C R - {c } ∪ C KB remains inconsistent which means that Q UICK X PLAIN returnsanother minimal conflict set which is CS = {c , c }.The original HSDAG algorithm [24] follows a strict breadth-firstsearch regime. Following this strategy, the next node to be expandedin our working example is the minimal conflict set CS which hasbeen returned by Q UICK X PLAIN for C R - {c } ∪ C KB . In this con-text, the first option to resolve CS is to delete c . This option is avalid one and ∆ = {c , c } is the resulting minimal diagnosis. Thesecond option for resolving CS is to delete the constraint c . In thiscase, we have identified the next minimal diagnosis ∆ = {c , c }since C R - {c , c } ∪ C KB is consistent. This way we are able toidentify all minimal sets of constraints ∆ i that – if deleted from C R – help to restore the consistency with C KB . If we want to calculatethe complete set of diagnoses for our working example, we still haveto resolve the conflict set CS . The first option to resolve CS is todelete c – since {c , c } has already been identified as a minimaldiagnosis, we can close this node in the HSDAG. The second optionto resolve CS is to delete c . In this case we have determined thethird minimal diagnosis which is ∆ = {c , c }.In our working example we are able to enumerate all possible di-agnoses that help to restore consistency. However, the calculation ofall minimal diagnoses is expensive and thus in many cases not practi-cable for interactive settings. Since users are often interested in a re-duced subset of all the potential diagnoses, alternative algorithms areneeded that are capable of identifying preferred diagnoses [24, 5, 11].Such approaches have already been developed [5, 11], however, theyare still based on the resolution of conflict sets which is computa-ionally expensive (see Section 5). Our idea presented in this paperis a diagnosis algorithm that helps to determine preferred diagnoseswithout the need of calculating conflict sets. The basic properties ofF AST D IAG will be discussed in Section 4.
AST D IAG
Preferred Diagnoses.
Users typically prefer to keep the impor-tant requirements and to change or delete (if needed) the less impor-tant ones [18]. The major goal of (model-based) diagnosis tasks is toidentify the preferred (leading) diagnoses which are not necessarilyminimal cardinality ones [5]. For the characterization of a preferreddiagnosis we will rely on the definition of a total ordering of the givenset of constraints in C (respectively C R ). Such a total ordering can beachieved, for example, by directly asking the customer regarding thepreferences, by applying multi-attribute utility theory [28, 1] wherethe determined interest dimensions correspond with the attributes ofC R or by applying the rankings determined by conjoint analysis [2].The following definition of a lexicographical ordering (Definition 6)is based on total orderings for constraints that has been applied in[18] for the determination of preferred conflict sets . Definition 6 (Total Lexicographical Ordering) . Given a total or-der < on C, we enumerate the constraints in C in increasing < orderc .. c n starting with the least important constraints (i.e., c i < c j ⇒ i< j). We compare two subsets X and Y of C lexicographically:X > lex Y iff ∃ k: c k ∈ Y - X andX ∩ {c k +1 , ..., c t } = Y ∩ {c k +1 , ..., c t }.Based on this definition of a lexicographical ordering, we can nowintroduce the definition of a preferred diagnosis . Definition 7 (Preferred Diagnosis) . A minimal diagnosis ∆ fora given CR diagnosis problem (C R , C KB ) is a preferred diagnosisfor (C R , C KB ) iff there does not exist another minimal diagnosis ∆ (cid:48) with ∆ (cid:48) > lex ∆ .In our working example we assumed the lexicographical order-ing (c < c < c ), i.e., the most important customer requirement isc (the 4-wheel functionality). If we assume that X = { c , c } and Y = { c , c } then Y - X = { c } and X ∩{ c } = Y ∩{ c } . Intuitively, { c , c } is a preferred diagnosis compared to { c , c } since both di-agnoses include c but c is less important than c . If we change theordering to (c < c < c ), F AST D IAG would then determine {c , c }as the preferred minimal diagnosis. F AST D IAG
Approach.
For the following discussions we intro-duce the set AC which is initially set to C KB ∪ C R (the union ofcustomer requirements (C R ) and the configuration knowledge base(C KB )) and subsequently changed when the algorithm runs. The ba-sic idea of the F AST D IAG algorithm (Algorithm 1) is the following. In our example, the set of customer requirements C R = {c , c , c }includes at least one minimal diagnosis since C KB is consistent andC KB ∪ C R is inconsistent. In the extreme case C R itself representsthe minimal diagnosis which then means that all constraints in C R are part of the diagnosis, i.e., each c i ∈ C R represents a singletonconflict. In our case C R obviously does not represent a minimal di-agnosis – the set of diagnoses in our working example is { ∆ = {c , In Algorithm 1 we use the set C instead of C R since the application of thealgorithm is not restricted to inconsistent sets of customer requirements. c }, ∆ = {c , c }, ∆ = {c , c }} (see Section 3). The next step inAlgorithm 1 is to divide the set of customer requirements C R = {c ,c , c } into the two sets C = {c } and C = {c , c } and to checkwhether AC - C is already consistent. If this is the case, we canomit the set C since at least one minimal diagnosis can already beidentified in C . In our case, AC - {c } is inconsistent, which meansthat we have to consider further elements from C . Therefore, C ={c , c } is divided into the sets {c } and {c }. In the next step wecan check whether AC – (C ∪ {c }) is consistent – this is the casewhich means that we do not have to further take into account {c }for determining the diagnosis. Since {c } does not include a diagno-sis but {c } ∪ {c } includes a diagnosis, we can deduce that {c }must be part of the diagnosis. The final step is to check whether AC– {c } leads to a diagnosis without including {c }. We see that AC– {c } is inconsistent, i.e., ∆ = {c , c } is a minimal diagnosis forthe CR diagnosis problem (C R = {c , c , c }, C KB = {c , . . . , c }).An execution trace of the F AST D IAG algorithm in the context of ourworking example is shown in Figure 2.
Algorithm 1 − F AST D IAG procedure F AST D IAG ( C ⊆ AC, AC = { c ..c t } ) : diagnosis ∆ if isEmpty ( C ) or inconsistent ( AC − C ) then return ∅ else return FD ( ∅ , C, AC ); procedure FD(
D, C = { c ..c q } , AC ) : diagnosis ∆ if D (cid:54) = ∅ and consistent ( AC ) then return ∅ else if singleton ( C ) then return C else k = q ;13: C = { c ..c k } ; C = { c k +1 ..c q } ; D = FD ( C , C , AC − C ); D = FD ( D , C , AC − D ); return ( D ∪ D ) ; Calculating n>1 Diagnoses.
In order to be able to calculate n >1diagnoses with F AST D IAG we have to adopt the HSDAG construc-tion introduced in [24] by substituting the resolution of conflicts (seeFigure 1) with the deletion of elements c i from C R ( C ) (see Figure3). In this case, a path in the HSDAG is closed if no further diagnosescan be identified for this path or the elements of the current path are asuperset of an already closed path (containment check). Conform tothe HSDAG approach presented in [24], we expand the search tree ina breadth-first manner. In our working example, we can delete {c }(one element of the first diagnosis ∆ = {c , c }) from the set C R of diagnosable elements and restart the algorithm for finding anotherminimal diagnosis for the CR diagnosis problem ({c , c }, C KB ).Since AC - {c } is inconsistent, we can conclude that C R = {c ,c } includes another minimal diagnosis ( ∆ = {c , c }) which is de-termined by F AST D IAG for the CR diagnosis problem (C R - {c },C KB ). Finally, we have to check whether the CR diagnosis problem({c , c }, C KB ) leads to another minimal diagnosis. This is the case,i.e., we have identified the last minimal diagnosis which is ∆ = {c ,c }. The calculation of all diagnoses in our working example on thebasis of F AST D IAG is depicted in Figure 3.Note that for a given set of constraints (C) F
AST D IAG always cal-culates the preferred diagnosis in terms of Definition 7. If ∆ is thediagnosis returned by F AST D IAG and we delete one element from Typically a CR diagnosis problem has more than one related diagnosis. igure 1.
HSDAG (Hitting Set Directed Acyclic Graph) [24] for the CR diagnosis problem (C R ={c , c , c }, C KB ={c , c , c , c }). The sets {c , c }, {c ,c }, and {c , c } are the minimal diagnoses – the conflict sets CS , CS , and CS are determined on the basis of Q UICK X PLAIN [18].
Figure 2. F AST D IAG execution trace for the CR diagnosis problem (C R ={c , c , c }, C KB ={c , c , c , c }). The enumerations 1–6 show the order in whichthe different incarnations of the FD function (procedure) are activated. ∆ (e.g., c ), then F AST D IAG returns the preferred diagnosis for theCR diagnosis problem ({c , c , c }-{c }, {c , ..., c }) which is ∆ in our example case, i.e., ∆ > lex ∆ . Consequently, diagnoses partof one path in the search tree (such as ∆ and ∆ in Figure 3) arein a strict preference ordering. However, there is only a partial order between diagnoses in the search tree in the sense that a diagnosis atlevel k is not necessarily preferable to a diagnosis at level k +1. F AST D IAG
Properties.
A detailed listing of the basic operationsof F
AST D IAG is shown in Algorithm 1. First, the algorithm checkswhether the constraints in C contain a diagnosis, i.e., whether AC -C is consistent – the function assumes that it is activated in the casethat AC is inconsistent. If AC - C is inconsistent or C = ∅ , F AST -D IAG returns the empty set as result (no solution can be found – line2 of the algorithm). If at least one diagnosis is contained in the setof constraints C, F
AST D IAG activates the FD function (procedure)which is in charge of retrieving a preferred diagnosis (line 3 of thealgorithm). F
AST D IAG follows a divide-and-conquer strategy wherethe recursive function FD divides the set of constraints (in our casethe elements of C R ) into two different subsets (C and C ) (line 8of the algorithm) and tries to figure out whether C already containsa diagnosis (line 5 of the algorithm). If this is the case, F AST D IAG does not further take into account the constraints in C . If only oneelement is remaining in the current set of constraints C and the cur-rent set of constraints in AC is still inconsistent, then the element inC is part of a minimal diagnosis (line 6 of the algorithm). F AST D IAG is complete in the sense that if C contains exactly one minimal di-agnosis then FD will find it. If there are multiple minimal diagnosesthen one of them (the preferred one – see Definition 7) is returned.The recursive function FD is triggered if AC-C is consistent and Cconsists of at least one constraint. In such a situation a corresponding minimal diagnosis can be identified. If we assume the existence of aminimal diagnosis ∆ that can not be identified by F AST D IAG , thiswould mean that there exists at least one constraint c a in C whichis part of the diagnosis but not returned by FD. The only way inwhich elements can be deleted from C (i.e., not included in a diag-nosis) is by the return ∅ statement in FD and ∅ is only returned inthe case that AC is consistent which means that the elements of C (C ) from the previous FD incarnation are not part of the preferreddiagnosis. Consequently, it is not possible to delete elements from Cwhich are part of the diagnosis. F AST D IAG computes only minimaldiagnoses in the sense of Definition 5. If we assume the existenceof a non-minimal diagnosis ∆ calculated by F AST D IAG , this wouldmean that there exists at least one constraint c a with ∆ - {c a } is stilla diagnosis. The only situation in which elements of C are added to adiagnosis ∆ is if C itself contains exactly one element. If C containsonly one element (let us assume c a ) and AC is inconsistent (in thefunction FD) then c a is the only element that can be deleted fromAC, i.e., c a must be part of the diagnosis. Performance of F
AST D IAG . In this section we will compare theperformance of F
AST D IAG with the performance of the hitting setalgorithm [24] in combination with the Q
UICK X PLAIN conflict de-tection algorithm introduced in [18].The worst case complexity of F
AST D IAG in terms of the numberof consistency checks needed for calculating one minimal diagno-sis is 2 d · log ( nd )+2 d , where d is the minimal diagnoses set size and n is the number of constraints (in C). The best case complexity islog ( nd )+2 d. In the worst case each element of the diagnosis is con-tained in a different path of the search tree: log ( nd ) is the depth of the igure 3. F AST D IAG : calculating the complete set of minimal diagnoses . The enumerations 1–6 show the order in which the different incarnations of theF
AST D IAG algorithm are activated. path, 2 d represents the branching factor and the number of leaf-nodeconsistency checks. In the best case all elements of the diagnosis arecontained in one path of the search tree.The worst case complexity of Q UICK X PLAIN in terms of con-sistency checks needed for calculating one minimal conflict set is2 k · log ( nk )+2 k where k is the minimal conflicts set size and n isagain the number of constraints (in C) [18]. The best case complex-ity of Q UICK X PLAIN in terms of the number of consistency checksneeded is log ( nk )+2 k [18]. Consequently, the number of consistencychecks per conflict set (Q UICK X PLAIN ) and the number of consis-tency checks per diagnosis (F
AST D IAG ) fall into a logarithmic com-plexity class.Let n cs be the number of minimal conflict sets in a constraint setand n diag be the number of minimal diagnoses, then we need n diag FD calls (see Algorithm 1) plus n cs additional consistency checksand n cs activations of Q UICK X PLAIN with n diag additional consis-tency checks for determining all diagnoses. The results of a perfor-mance evaluation of F
AST D IAG are depicted in the Figures 4–7. Thebasis for these evaluations was the bicycle configuration knowledgebase taken from the CLib configuration benchmarks library (34 vari-ables and about 65 constraints). For this example knowledge basewe randomly generated different sets of requirements (of cardinality5,7,10, and all possible requirements) and measured the performanceof calculating corresponding diagnosis sets (the first diagnosis, first 5diagnoses, first 10 diagnoses, and all diagnoses). The runtime perfor-mance of the different diagnosis algorithms and the needed amountof TP calls is shown in the Figures 4–7. As solver we used the CLibbased decision diagram represenation which allows for backtracking-free solution search. The tests have been executed on a standard desk-top computer ( Intel(R) Core(TM)2 Quad CPU QD9400
CPU with and
RAM). Note that we have evaluated the perfor-mance of F
AST D IAG with different other benchmark configurationknowledge bases on the CLib web page with basically the same re-sult. F
AST D IAG shows to be a valuable alternative for determiningdiagnoses in interactive settings especially for calculating the pre-ferred first-n solutions.Figure 4 shows a comparison between the hitting set based diag-nosis approach (denoted as HSDAG) and the F
AST D IAG algorithm(denoted as F
AST D IAG ) in the case that only one diagnosis is calcu-lated. F
AST D IAG clearly outperforms the HSDAG approach inde-pendent of the way in which diagnoses are calculated (breadth-firstor best-first). Figure 5 shows the performance evaluation for calcu-lating the topmost-5 minimal diagnoses . The result is similar to theone for calculating the first diagnosis, i.e., F
AST D IAG outperformsthe two HSDAG versions. Our evaluations show that F
AST D IAG isvery efficient in calculating preferred minimal diagnoses. Empirical Evaluation.
Based on a computer configuration datasetof the Graz University of Technology (N = 415 configurations) weevaluated the three presented approaches w.r.t. their capability of pre-dicting diagnoses that are acceptable for the user (diagnoses leadingto selected configurations). Each entry of the dataset consists of a setof initial user requirements C R inconsistent with the configurationknowledge base C KB and the configuration which had been finallyselected by the user. Since the original requirements stored in thedataset are inconsistent with the configuration knowledge base, wecould determine those diagnoses that indicated which minimal setsof requirements have to be deleted in order to be able to find a solu-tion.We evaluated the prediction accuracy of the three diagnosis ap-proaches ( HSDAG breadth-first , F
AST D IAG , and
HSDAG best-first ).First, we measured the distance between the predicted position of adiagnosis leading to a selected configuration and the expected po-sition of the diagnosis (which is 1). This distance was measuredin terms of the root mean square deviation – RMSD (see Formula1). Table 1 depicts the results of this first analysis. An importantresult is that F
AST D IAG has the lowest RMSD value (0.95). Best-first HSDAG has a similar prediction quality (RMSD = 0.97). Fi-nally, breadth-first HSDAG has the worst prediction quality (RMSD= 1.64).
RMSD = (cid:118)(cid:117)(cid:117)(cid:116) n n (cid:88) ( predicted position − (1)RMSD is an often used quality estimate but it provides only a lim-ited view on the precision of a (diagnosis) prediction. Therefore wewanted to analyze the precision of the diagnosis selection strategiesdiscussed in this paper – a measure for the precision of a diagnosisalgorithm is depicted in Formula 2. The idea behind this measure isto describe how often a diagnosis that leads to a selected configura-tion (selected by the user) is among the topmost-n ranked diagnoses.As shown in Table 2, F AST D IAG and best-first HSDAG have highestprediction accuracy in terms of precision whereas the breadth-firstHSDAG approach shows the worst precision. precision = | correctly predicted diagnoses || predicted diagnoses | (2)We applied a Mann-Whitney-U-Test in order to statistically ana-lyze differences between the three diagnosis approaches in terms ofranking behavior. We conducted a pair wise comparison between thediagnosis approaches on the basis of the mentioned Mann-Whitney-U-Test. We could identify a significant difference between the rank-ings of best-first HSDAG and breadth-first HSDAG based diagnosis( p = 6 . e − ) and also between F AST D IAG and breadth-first HS- igure 4.
Calculating the first minimal diagnosis with F
AST D IAG vs. hitting set based diagnosis on the basis of Q
UICK X PLAIN for 5, 7, 10, and 15 userrequirements ( req ): performance in msec on the lhs and number of needed TP calls on the rhs . Figure 5.
Calculating the topmost-5 minimal diagnoses with F
AST D IAG vs. hitting set based diagnosis on the basis of Q
UICK X PLAIN for 5, 7, 10, and 15 userrequirements ( req ): performance in msec on the lhs and number of needed TP calls on the rhs . DAG based diagnosis ( p < . e − ). There was no significant dif-ference between best-first HSDAG and F AST D IAG in terms of rank-ing behavior ( p = 0 . ). Knowledge Base Analysis . The authors of [10] introduce an al-gorithm for the automated debugging of configuration knowledgebases. The idea is to combine a conflict detection algorithm such asQ
UICK X PLAIN [18] with the hitting set algorithm used in model-based diagnosis (MBD) [24] for the calculation of minimal diag-noses. In this context, conflicts are induced by test cases (examples)that, for example, stem from previous configuration sessions, havebeen automatically generated, or have been explicitly defined by do-main experts. Further applications of MBD in constraint set debug-ging are introduced in [9] where diagnosis concepts are used to iden-tify minimal sets of faulty transition conditions in state charts andin [12] where MBD is applied for the identification of faulty utilityconstraint sets in the context of knowledge-based recommendation.In contrast to [10, 9, 12], our work provides an algorithm that al-lows to directly determine diagnoses without the need to determinecorresponding conflict sets. F
AST D IAG can be applied in knowledgeengineering scenarios for calculating preferred diagnoses for faultyknowledge bases given that we are able to determine reasonable or-dering for the given set of constraints – this could be achieved, forexample, by the application of corresponding complexity metrics [4].
Conflict Detection . In contrast to the algorithm presented in thispaper, calculating diagnoses for inconsistent requirements typicallyrelies on the existence of (minimal) conflict sets. A well-known al-gorithm with a logarithmic number of consistency checks dependingon the number of constraints in the knowledge base and the cardi-nality of the minimal conflicts – Q
UICK X PLAIN [18] – has made amajor contribution to more efficient interactive constraint-based ap- plications. Q
UICK X PLAIN is based on a divide-and-conquer strat-egy. F
AST D IAG relies on the same principle of divide-and-conquerbut with a different focus, namely the determination of minimal diag-noses. Q
UICK X PLAIN calculates minimal conflict sets based on theassumption of a linear preference ordering among the constraints.Similarly – if we assume a linear preference ordering of the con-straints in C – F
AST D IAG calculates preferred diagnoses.
Interactive Settings . Note that in the interactive configuration sce-nario discussed in this paper our goal was to support open config-uration which lets the user explore the configuration space wherethe system proactively points out inconsistent requirements – such afunctionality is often provided by commercial configuration environ-ments. The authors of [23] focus on interactive settings where usersof constraint-based applications are confronted with situations whereno solution can be found. In this context, [23] introduce the conceptof minimal exclusion sets which correspond to the concept of min-imal diagnoses as defined in [24]. As mentioned, the major focusof [23] are settings where the proposed algorithm supports users inthe identification of acceptable exclusion sets. The authors proposean algorithm (representative explanations) that helps to improve thequality of the presented exclusion set (in terms of diversity) and thusincreases the probability of finding an acceptable exclusion set for theuser. Our diagnosis approach calculates preferred diagnoses in termsof a predefined ordering of the constraint set. Thus – compared to thework of [23] – we follow a different approach in terms of focusingmore on preferences than on the degree of representativeness.
Diagnosis Algorithms . There are a couple of algorithms that helpto improve the efficiency of diagnosis determination – they are fur-ther developments of the original algorithm introduced by Reiter[24]. These approaches focus on making the construction of hittingsets more efficient. Wotawa [29] introduces an algorithm that reducesthe number of subset checks compared to the original HSDAG ap-proach [24]. Fijany et al. [13] introduce an approach to represent igure 6.
Calculating the topmost- 10 minimal diagnoses with F
AST D IAG vs. hitting set based diagnosis on the basis of Q
UICK X PLAIN for 5, 7, 10, and 15user requirements ( req ): performance in msec on the lhs and number of needed TP calls on the rhs . Figure 7.
Calculating all minimal diagnoses with F
AST D IAG vs. hitting set based diagnosis on the basis of Q
UICK X PLAIN for 5, 7, 10, and 15 user require-ments ( req ): performance in msec on the lhs and number of needed TP calls on the rhs . the problem of determining minimal hitting sets as a correspondinginteger programming problem. Further approaches to optimize thedetermination of hitting sets are discussed in [20]. All the mentionedapproaches rely on (minimal) conflict sets which are the basis for cal-culating a set of minimal diagnoses, whereas F AST D IAG is a com-plete and minimal diagnosis algorithm without the need of conflictsets. It is important to mention that especially when calculating thefirst n-diagnoses (for n > 1, i.e., not a single diagnosis), F
AST D IAG can also exploit the mentioned algorithms of [20, 29] for the calcu-lation of more than one diagnosis, i.e., it is not bound to the usage ofthe original HSDAG algorithm. Lin et al. [19] introduce an approachto determine hitting sets on the basis of genetic algorithms; a sim-ilar approach to the determination of diagnoses is presented in [8]who introduce a stochastic fault diagnosis algorithm which is basedon greedy stochastic search. Such approaches show to significantlyimprove search performance, however, there is no general guaranteeof completeness and diagnosis minimality. Finally, there exist a cou-ple of algorithms that are improving the algorithmic performance ofdiagnosis calculation due to additional knowledge about the struc-tural properties of the diagnosis problem. For example, [17] showthe determination of (minimal) diagnoses for the case of conjunctivequeries on database tables (the set of diagnoses can be precompiledby executing the individual parts of the query on the given dataset);Siddiqi et al. [25] show one approach to exploit structural propertiesof system descriptions to improve the overall performance of diagno-sis determination – in this case, cones are areas in a gate with a cer-tain structure and a certain probability of including a diagnosis – thesearch process focuses on exactly those areas. F
AST D IAG does notexploit specific properties of the underlying constraint set, however,taking into account such properties can further improve the perfor-mance of the algorithm – corresponding evaluations are within thescope of future work.
Personalized Diagnosis . Many of the existing diagnosis ap- proaches do not take into account the need for personalizing the set ofdiagnoses to be presented to a user. Identifying diagnoses of interestin an efficient manner is a clear surplus regarding the acceptance ofthe underlying application, for example, users of a configurator appli-cation are not necessarily interested in minimal cardinality diagnoses[24] but rather in those that correspond to their current preferences.A first step towards the application of personalization concepts in thecontext of knowledge-based recommendation is presented in [11].The authors introduce an approach that calculates leading diagnoseson the basis of similarity measures used for determining n-nearestneighbors. A general approach to the identification of preferred di-agnoses is introduced in [5] where probability estimates are used todetermine the leading diagnoses with the overall goal to minimizethe number of measurements needed for identifying a malfunction-ing device. Basic principles of determining diagnoses in knowledge-based recommendation scenarios are discussed in [17]. Furthermore,[16] introduce a logical characterization of preferences which are ex-pressed as preference relations on single diagnoses and modal logicalformulas on groups of diagnoses. In contrast to our work, [16] do notprovide an algorithm to efficiently calculate preferred diagnoses. Wesee our work as a major contribution in this context since F
AST D IAG helps to identify leading diagnoses more efficiently – further empiri-cal studies in different application contexts are within the major focusof our future work.
In this paper we have introduced a new diagnosis algorithm(F
AST D IAG ) which allows the efficient calculation of one diagno-sis at a time with logarithmic complexity in terms of the number ofconsistency checks. Thus, the computational complexity for the cal-culation of one minimal diagnosis is equal to the calculation of oneminimal conflict set in hitting set based diagnosis approaches. Thereadth-first (HSDAG) F
AST D IAG best-first (HSDAG)1.64 0.95 0.97
Table 1.
Root Mean Square Deviation (RMSD) of the diagnosis approaches. top-n diagnoses breadth-first (HSDAG) F
AST D IAG best-first (HSDAG)n=1 0.51 0.70 0.74n=2 0.75 0.88 0.89n=3 0.87 0.97 0.96
Table 2.
Precision of F
AST D IAG vs. HSDAG based approaches. algorithm is especially applicable in settings where the number ofconflict sets is equal to or larger than the number of diagnoses, or insettings where preferred (leading) diagnoses are needed. Issues forfuture work are the determination of repair actions for diagnoses, thefurther development of F
AST D IAG for supporting anytime diagno-sis tasks, and the conduction of further empirical studies in differentconfigurator application domains.
References [1] L. Ardissono, A. Felfernig, G. Friedrich, D. Jannach, G. Petrone,R. Schaefer, and M. Zanker. A Framework for the development of per-sonalized, distributed web-based configuration systems.
AI Magazine ,24(3):93–108, 2003.[2] F. Belanger. A conjoint analysis of online consumer satisfaction.
Jour-nal of Electronic Commerce Research , 6:95–111, 2005.[3] L. Castillo, D. Borrajo, and M. Salido.
Planning, Scheduling and Con-straint Satisfaction: From Theory to Practice . IOS Press, 2005.[4] Z. Chen and C. Suen. Measuring the complexity of rule-based expertsystems.
Expert Systems with Applications , 7(4):467–481, 2003.[5] J. DeKleer. Using crude probability estimates to guide diagnosis.
AIJournal , 45(3):381–391, 1990.[6] J. DeKleer, A. Mackworth, and R. Reiter. Characterizing diagnoses andsystems.
AI Journal , 56(2–3):197–222, 1992.[7] J. DeKleer and B. Williams. Diagnosing multiple faults.
AI Journal ,32(1):97–130, 1987.[8] A. Feldman, G. Provan, and A. Gemund. Computing minimal di-agnoses by greedy stochastic search. In
Proceedings of the 23rdAAAI Conference on Artificial Intelligence (AAAI’08) , pages 911–918,Chicago, IL, 2008.[9] A. Felfernig, G. Friedrich, K. Isak, K. Shchekotykhin, E. Teppan, andD. Jannach. Automated debugging of recommender user interface de-scriptions.
Journal of Applied Intelligence , 31(1):1–14, 2007.[10] A. Felfernig, G. Friedrich, D. Jannach, and M. Stumptner. Consistency-based diagnosis of configuration knowledge bases.
AI Journal ,152(2):213–234, 2004.[11] A. Felfernig, G. Friedrich, M. Schubert, M. Mandl, M. Mairitsch, andE. Teppan. Plausible repairs for inconsistent requirements. In ,pages 791–796, Pasadena, CA, 2009.[12] A. Felfernig, G. Friedrich, E. Teppan, and K. Isak. Intelligent debug-ging and repair of utility constraint sets in knowledge-based recom-mender applications. In , pages 218–226, Canary Islands, Spain,2008.[13] A. Fijany and F. Vatan. New approaches for efficient solutions of hittingset problems. In
International Symposium of Information and Commu-nication Technologies , pages 1–10, Cancun, Mexico, 2004.[14] G. Fleischanderl, G. Friedrich, A. Haselboeck, H. Schreiner, andM. Stumptner. Configuring large systems using generative constraintsatisfaction.
IEEE Intelligent Systems , 13(4):59–68, 1998.[15] G. Friedrich and K. Shchekotykhin. A general diagnosis method forontologies. In ,number 3729 in Lecture Notes in Computer Science, pages 232–246,Galway, Ireland, 2005. Springer.[16] P. Froehlich, W. Nejdl, and M. Schroeder. A formal semantics for pref-erences and strategies in model-based diagnosis. In , pages 106–113, 1994. [17] D. Jannach and J. Liegl. Conflict-directed relaxation of constraints incontent-based recommender systems. In
IEA/AIE 2006 , pages 819–829, Annency, France, 2006.[18] U. Junker. Quickxplain: Preferred explanations and relaxations forover-constrained problems. In , pages 167–172, San Jose, CA, 2004.[19] L. Lin and Y. Jiang. Computing minimal hitting sets with genetic algo-rithms.
Algorithmica , 32(1):95–106, 2002.[20] L. Lin and Y. Jiang. The computation of hitting sets: review and newalgorithm.
Information Processing Letters , 86:177–184, 2003.[21] J. Marques-Silva and K. Sakallah. Grasp: A new search algorithm forsatisfiability. In
International Conference on Computer-Aided Design ,pages 220–227, Santa Clara, CA, 1996.[22] S. Mittal and F. Frayman. Towards a generic model of configurationtasks. In , pages 1395–1401, Detroit, MI, 1989.[23] Barry O’Sullivan, A. Papdopoulos, B. Faltings, and P. Pu. Represen-tative explanations for over-constrained problems. In , pages 323–328, Van-couver, Canada, 2007.[24] R. Reiter. A theory of diagnosis from first principles.
AI Journal ,23(1):57–95, 1987.[25] S. Siddiqi and J. Huang. Hierarchical diagnosis of multiple faults.In , pages 581–586, Hyderabad, India, 2007.[26] C. Sinz and A. Haag. Configuration.
IEEE Intelligent Systems ,22(1):78–90, 2007.[27] E. Tsang.
Foundations of Constraint Satisfaction . Ac. Press, 1993.[28] D. Winterfeldt and W. Edwards. Decision analysis and behavioral re-search.
Cambridge University Press , 1986.[29] F. Wotawa. A variant of reiter’s hitting-set algorithm.