[PDF] An Efficient Diagnosis Algorithm for Inconsistent Constraint Sets

Abstract

Constraint sets can become inconsistent in different contexts. For example, during a configuration session the set of customer requirements can become inconsistent with the configuration knowledge base. Another example is the engineering phase of a configuration knowledge base where the underlying constraints can become inconsistent with a set of test cases. In such situations we are in the need of techniques that support the identification of minimal sets of faulty constraints that have to be deleted in order to restore consistency. In this paper we introduce a divide-and-conquer based diagnosis algorithm (FastDiag) which identifies minimal sets of faulty constraints in an over-constrained problem. This algorithm is specifically applicable in scenarios where the efficient identification of leading (preferred) diagnoses is crucial. We compare the performance of FastDiag with the conflict-directed calculation of hitting sets and present an in-depth performance analysis that shows the advantages of our approach.

Full PDF

AAn Efﬁcient Diagnosis Algorithm forInconsistent Constraint Sets Alexander Felfernig and Monika Schubert and Christoph Zehentner Abstract.

Constraint sets can become inconsistent in different con-texts. For example, during a conﬁguration session the set of customerrequirements can become inconsistent with the conﬁguration knowl-edge base. Another example is the engineering phase of a conﬁgu-ration knowledge base where the underlying constraints can becomeinconsistent with a set of test cases. In such situations we are in theneed of techniques that support the identiﬁcation of minimal sets offaulty constraints that have to be deleted in order to restore consis-tency. In this paper we introduce a divide-and-conquer based diag-nosis algorithm (F

AST D IAG ) which identiﬁes minimal sets of faultyconstraints in an over-constrained problem. This algorithm is speciﬁ-cally applicable in scenarios where the efﬁcient identiﬁcation of lead-ing (preferred) diagnoses is crucial. We compare the performance ofF

AST D IAG with the conﬂict-directed calculation of hitting sets andpresent an in-depth performance analysis that shows the advantagesof our approach.

Keywords : Interactive Conﬁguration, Preferred Diagnoses, DirectDiagnosis, Model-based Diagnosis, Inconsistent Constraint Sets.

Constraint technologies [27] are applied in different areas such asconﬁguration [22, 14, 26], recommendation [11], and scheduling [3].There are many scenarios where the underlying constraint sets canbecome over-constrained. For example, when implementing a con-ﬁguration knowledge base, constraints can become inconsistent witha set of test cases [10]. Alternatively, when interacting with a con-ﬁgurator application [23, 11], the given set of customer requirements(represented as constraints) can become inconsistent with the conﬁg-uration knowledge base. In both situations there is a need of an in-telligent assistance that actively supports users of a constraint-basedapplication (end users or knowledge engineers). A wide-spread ap-proach to support users in the identiﬁcation of minimal sets of faultyconstraints is to combine conﬂict detection (see, e.g., [18]) with acorresponding hitting set algorithm [7, 24, 6]. In their original formthese algorithms are applied for the calculation of minimal ( cardi-nality ) diagnoses which are typically determined with breadth-ﬁrstsearch. Further diagnosis algorithms have been developed that fol-low a best-ﬁrst search regime where the expansion of the hitting setsearch tree is guided by failure probabilities of components [5]. An-other example for such an approach is presented in [11] where simi-larity metrics are used to guide the (best-ﬁrst) search for a preferred Preprint of: A. Felfernig, M. Schubert, and C. Zehentner. An Efﬁcient Diag-nosis Algorithm for Inconsistent Constraint Sets. Artiﬁcial Intelligence forEngineering Design, Analysis, and Manufacturing (AIEDAM), CambridgeUniversity Press, vol. 26, no.1, pp. 53-62, 2012. TU Graz, Institute of Software Technology, Applied Software Engineering& AI, Austria, email: {felfernig, schubert, zehentner}@ist.tugraz.at (plausible) minimal diagnosis (including repairs).Both, simple breadth-ﬁrst search and best-ﬁrst search diagnosisapproaches are predominantly relying on the calculation of conﬂictsets [18]. In this context, the determination of a minimal diagnosisof cardinality n requires the identiﬁcation of at least n minimal con-ﬂict sets. In this paper, we introduce a direct diagnosis algorithm(F AST D IAG ) that allows to determine one minimal diagnosis at atime with the same computational effort needed for determining oneconﬂict set at a time . F

AST D IAG supports the identiﬁcation of pre-ferred diagnoses given predeﬁned preferences regarding a set of deci-sion alternatives. It boosts the applicability of diagnosis methods inscenarios such as online conﬁguration & reconﬁguration [10], rec-ommendation of products & services [11], and (more generally) inscenarios where the efﬁcient calculation of preferred (leading) diag-noses is crucial [5]. F

AST D IAG is not restricted to constraint-basedsystems but it is also applicable, for example, in the context of SATsolving [21] and description logics reasoning [15].The remainder of this paper is organized as follows. In Section2 we introduce a simple example conﬁguration task from the auto-motive domain. In Section 3 we discuss the basic hitting set basedapproach to the calculation of diagnoses. In Section 4 we introducean algorithm (F

AST D IAG ) for calculating preferred diagnoses for agiven over-constrained problem. In Section 5 we present a detailedevaluation of F

AST D IAG which clearly outperforms standard hittingset based algorithms in the calculation of the topmost - n preferred di-agnoses. With Section 6 we provide an overview of related work inthe ﬁeld. The paper is concluded with Section 7. Car conﬁguration will serve as a working example throughout thispaper. Since we exploit conﬁguration problems for the discussion ofour diagnosis algorithm, we ﬁrst introduce a formal deﬁnition of aconﬁguration task. This deﬁnition is based on [10] but is given in thecontext of a constraint satisfaction problem (CSP) [27].

Deﬁnition 1 (Conﬁguration Task) . A conﬁguration task can bedeﬁned as a CSP (V, D, C). V = {v , v , . . . , v n } represents a set ofﬁnite domain variables. D = {dom(v ), dom(v ), . . . , dom(v n )} rep-resents a set of variable domains dom(v k ) where dom(v k ) representsthe domain of variable v k . C = C KB ∪ C R where C KB = {c , c , . . . ,c q } is a set of domain speciﬁc constraints (the conﬁguration knowl-edge base) that restrict the possible combinations of values assignedto the variables in V. C R = {c q +1 , c q +2 , . . . , c t } is a set of customerrequirements also represented as constraints.A simpliﬁed example of a conﬁguration task in the automotive do-main is the following. In this example, type represents the car type, pdc is the parc distance control functionality, fuel represents the fuel a r X i v : . [ c s . A I] F e b onsumption per 100 kilometers, a skibag allows ski stowage in-side the car, and represents the corresponding actuation type.These variables describe the potential set of requirements that can bespeciﬁed by the user (customer). The possible combinations of theserequirements are deﬁned by a set of constraints which are denotedas conﬁguration knowledge base (C KB ) which is deﬁned as C KB ={c , c , c , c } in our example. Furthermore, we assume the set of customer requirements C R = {c , c , c }. • V = { type , pdc , fuel , skibag , } • D = {dom( type )={ city , limo , combi , xdrive }, dom( pdc )= { yes , no }, dom( fuel ) = { , , }, dom( skibag )={ yes , no }, dom( )={ yes , no } • C KB = { c : = yes ⇒ type = xdrive , c : skibag = yes ⇒ type (cid:54) = city , c : fuel = ⇒ type = city , c : fuel = ⇒ type (cid:54) = xdrive } • C R = { c : type = combi , c : fuel = , c : = yes}On the basis of this conﬁguration task deﬁnition, we can now intro-duce the deﬁnition of a concrete conﬁguration (solution for a conﬁg-uration task). Deﬁnition 2 (Conﬁguration) . A conﬁguration for a given conﬁg-uration task (V, D, C) is an instantiation I = {v =ins , v =ins , . . . ,v n =ins n } where ins k ∈ dom(v k ).A conﬁguration is consistent if the assignments in I are consistentwith the c i ∈ C. Furthermore, a conﬁguration is complete if all vari-ables in V are instantiated. Finally, a conﬁguration is valid if it isconsistent and complete.

For the conﬁguration task introduced in Section 2 we are not ableto ﬁnd a solution, for example, a combi -type car does not support afuel consumption of

4l per 100 kilometers . Consequently, we wantto identify minimal sets of constraints (c i ∈ C R ) which have to bedeleted in order to be able to identify a solution (restore the consis-tency). In the example of Section 2 the set of constraints C R ={c ,c , c } is inconsistent with the constraints C KB = {c , c , c , c },i.e., no solution can be found for the underlying conﬁguration task.A standard approach to determine a minimal set of constraints thathave to be deleted from an over-constrained problem is to resolve allminimal conﬂicts contained in the constraint set. The determinationof such constraints is based on a conﬂict detection algorithm (see,e.g., [18]), the derivation of the corresponding diagnoses is based onthe calculation of hitting sets [24]. Since both, the notion of a ( min-imal ) conﬂict and the notion of a ( minimal ) diagnosis will be usedin the following sections, we provide the corresponding deﬁnitionshere. Deﬁnition 3 (Conﬂict Set) . A conﬂict set is a set CS ⊆ C R s.t.C KB ∪ CS is inconsistent. CS is a minimal if there does not exist aconﬂict set CS’ with CS’ ⊂ CS.In our working example we can identify three minimal conﬂictsets which are CS ={c ,c }, CS ={c ,c }, and CS ={c ,c }.CS , CS , CS are conﬂict sets since CS ∪ C KB ∨ CS ∪ C KB ∨ CS ∪ C KB is inconsistent. The minimality property is fulﬁlledsince there does not exist a conﬂict set CS with CS ⊂ CS or CS ⊂ CS or CS ⊂ CS . The standard approach to resolve the givenconﬂicts is the construction of a corresponding hitting set directedacyclic graph ( HSDAG ) [24] where the resolution of all minimal con-ﬂict sets automatically corresponds to the identiﬁcation of a minimal Note that constraints are not necessarily unary or binary (we tried to keepthe example simple), they can also be n-ary . diagnosis. A minimal diagnosis in our application context is a mini-mal set of customer requirements contained in the set of car features(C R ) that has to be deleted from C R in order to make the remain-ing constraints consistent with C KB . Since we are dealing with thediagnosis of customer requirements, we introduce the deﬁnition of a customer requirements diagnosis problem (Deﬁnition 4). This deﬁ-nition is based on the deﬁnition given in [10]. Deﬁnition 4 (CR Diagnosis Problem) . A customer requirementsdiagnosis (CR diagnosis) problem is deﬁned as a tuple (C KB , C R )where C R is the set of given customer requirements and C KB repre-sents the constraints part of the conﬁguration knowledge base.The deﬁnition of a CR diagnosis that corresponds to a given CRDiagnosis Problem is the following (see Deﬁnition 5).

Deﬁnition 5 (CR Diagnosis) . A CR diagnosis for a CR diagnosisproblem (C KB , C R ) is a set ∆ ⊆ C R , s.t., C KB ∪ (C R - ∆ ) is con-sistent. ∆ is minimal if there does not exist a diagnosis ∆ ’ ⊂ ∆ s.t.C KB ∪ (C R - ∆ ’) is consistent.The HSDAG algorithm for determining minimal diagnoses is dis-cussed in detail in [24]. The concept of this algorithm will be ex-plained on the basis of our working example. It relies on a conﬂictdetection algorithm that is responsible for detecting minimal con-ﬂicts in a given set of constraints (in our case in the given customerrequirements). One conﬂict detection algorithm is Q UICK X PLAIN [18] which is based on an efﬁcient divide-and-conquer search strat-egy. For the purposes of our working example let us assume that theﬁrst minimal conﬂict set determined by Q

UICK X PLAIN is the setCS = {c , c }. Due to the minimality property, we are able to re-solve each conﬂict by simply deleting one element from the set, forexample, in the case of CS we have to either delete c or c . Eachvariant to resolve a conﬂict set is represented by a speciﬁc path inthe corresponding HSDAG – the HSDAG for our working exampleis depicted in Figure 1. The deletion of c from CS triggers thecalculation of another conﬂict set CS = {c , c } since C R - {c } ∪ C KB is inconsistent. If we decide to delete c from CS , C R - {c } ∪ C KB remains inconsistent which means that Q UICK X PLAIN returnsanother minimal conﬂict set which is CS = {c , c }.The original HSDAG algorithm [24] follows a strict breadth-ﬁrstsearch regime. Following this strategy, the next node to be expandedin our working example is the minimal conﬂict set CS which hasbeen returned by Q UICK X PLAIN for C R - {c } ∪ C KB . In this con-text, the ﬁrst option to resolve CS is to delete c . This option is avalid one and ∆ = {c , c } is the resulting minimal diagnosis. Thesecond option for resolving CS is to delete the constraint c . In thiscase, we have identiﬁed the next minimal diagnosis ∆ = {c , c }since C R - {c , c } ∪ C KB is consistent. This way we are able toidentify all minimal sets of constraints ∆ i that – if deleted from C R – help to restore the consistency with C KB . If we want to calculatethe complete set of diagnoses for our working example, we still haveto resolve the conﬂict set CS . The ﬁrst option to resolve CS is todelete c – since {c , c } has already been identiﬁed as a minimaldiagnosis, we can close this node in the HSDAG. The second optionto resolve CS is to delete c . In this case we have determined thethird minimal diagnosis which is ∆ = {c , c }.In our working example we are able to enumerate all possible di-agnoses that help to restore consistency. However, the calculation ofall minimal diagnoses is expensive and thus in many cases not practi-cable for interactive settings. Since users are often interested in a re-duced subset of all the potential diagnoses, alternative algorithms areneeded that are capable of identifying preferred diagnoses [24, 5, 11].Such approaches have already been developed [5, 11], however, theyare still based on the resolution of conﬂict sets which is computa-ionally expensive (see Section 5). Our idea presented in this paperis a diagnosis algorithm that helps to determine preferred diagnoseswithout the need of calculating conﬂict sets. The basic properties ofF AST D IAG will be discussed in Section 4.

AST D IAG

Preferred Diagnoses.

Users typically prefer to keep the impor-tant requirements and to change or delete (if needed) the less impor-tant ones [18]. The major goal of (model-based) diagnosis tasks is toidentify the preferred (leading) diagnoses which are not necessarilyminimal cardinality ones [5]. For the characterization of a preferreddiagnosis we will rely on the deﬁnition of a total ordering of the givenset of constraints in C (respectively C R ). Such a total ordering can beachieved, for example, by directly asking the customer regarding thepreferences, by applying multi-attribute utility theory [28, 1] wherethe determined interest dimensions correspond with the attributes ofC R or by applying the rankings determined by conjoint analysis [2].The following deﬁnition of a lexicographical ordering (Deﬁnition 6)is based on total orderings for constraints that has been applied in[18] for the determination of preferred conﬂict sets . Deﬁnition 6 (Total Lexicographical Ordering) . Given a total or-der < on C, we enumerate the constraints in C in increasing < orderc .. c n starting with the least important constraints (i.e., c i < c j ⇒ i< j). We compare two subsets X and Y of C lexicographically:X > lex Y iff ∃ k: c k ∈ Y - X andX ∩ {c k +1 , ..., c t } = Y ∩ {c k +1 , ..., c t }.Based on this deﬁnition of a lexicographical ordering, we can nowintroduce the deﬁnition of a preferred diagnosis . Deﬁnition 7 (Preferred Diagnosis) . A minimal diagnosis ∆ fora given CR diagnosis problem (C R , C KB ) is a preferred diagnosisfor (C R , C KB ) iff there does not exist another minimal diagnosis ∆ (cid:48) with ∆ (cid:48) > lex ∆ .In our working example we assumed the lexicographical order-ing (c < c < c ), i.e., the most important customer requirement isc (the 4-wheel functionality). If we assume that X = { c , c } and Y = { c , c } then Y - X = { c } and X ∩{ c } = Y ∩{ c } . Intuitively, { c , c } is a preferred diagnosis compared to { c , c } since both di-agnoses include c but c is less important than c . If we change theordering to (c < c < c ), F AST D IAG would then determine {c , c }as the preferred minimal diagnosis. F AST D IAG

Approach.

For the following discussions we intro-duce the set AC which is initially set to C KB ∪ C R (the union ofcustomer requirements (C R ) and the conﬁguration knowledge base(C KB )) and subsequently changed when the algorithm runs. The ba-sic idea of the F AST D IAG algorithm (Algorithm 1) is the following. In our example, the set of customer requirements C R = {c , c , c }includes at least one minimal diagnosis since C KB is consistent andC KB ∪ C R is inconsistent. In the extreme case C R itself representsthe minimal diagnosis which then means that all constraints in C R are part of the diagnosis, i.e., each c i ∈ C R represents a singletonconﬂict. In our case C R obviously does not represent a minimal di-agnosis – the set of diagnoses in our working example is { ∆ = {c , In Algorithm 1 we use the set C instead of C R since the application of thealgorithm is not restricted to inconsistent sets of customer requirements. c }, ∆ = {c , c }, ∆ = {c , c }} (see Section 3). The next step inAlgorithm 1 is to divide the set of customer requirements C R = {c ,c , c } into the two sets C = {c } and C = {c , c } and to checkwhether AC - C is already consistent. If this is the case, we canomit the set C since at least one minimal diagnosis can already beidentiﬁed in C . In our case, AC - {c } is inconsistent, which meansthat we have to consider further elements from C . Therefore, C ={c , c } is divided into the sets {c } and {c }. In the next step wecan check whether AC – (C ∪ {c }) is consistent – this is the casewhich means that we do not have to further take into account {c }for determining the diagnosis. Since {c } does not include a diagno-sis but {c } ∪ {c } includes a diagnosis, we can deduce that {c }must be part of the diagnosis. The ﬁnal step is to check whether AC– {c } leads to a diagnosis without including {c }. We see that AC– {c } is inconsistent, i.e., ∆ = {c , c } is a minimal diagnosis forthe CR diagnosis problem (C R = {c , c , c }, C KB = {c , . . . , c }).An execution trace of the F AST D IAG algorithm in the context of ourworking example is shown in Figure 2.

Algorithm 1 − F AST D IAG procedure F AST D IAG ( C ⊆ AC, AC = { c ..c t } ) : diagnosis ∆ if isEmpty ( C ) or inconsistent ( AC − C ) then return ∅ else return FD ( ∅ , C, AC ); procedure FD(

D, C = { c ..c q } , AC ) : diagnosis ∆ if D (cid:54) = ∅ and consistent ( AC ) then return ∅ else if singleton ( C ) then return C else k = q ;13: C = { c ..c k } ; C = { c k +1 ..c q } ; D = FD ( C , C , AC − C ); D = FD ( D , C , AC − D ); return ( D ∪ D ) ; Calculating n>1 Diagnoses.

In order to be able to calculate n >1diagnoses with F AST D IAG we have to adopt the HSDAG construc-tion introduced in [24] by substituting the resolution of conﬂicts (seeFigure 1) with the deletion of elements c i from C R ( C ) (see Figure3). In this case, a path in the HSDAG is closed if no further diagnosescan be identiﬁed for this path or the elements of the current path are asuperset of an already closed path (containment check). Conform tothe HSDAG approach presented in [24], we expand the search tree ina breadth-ﬁrst manner. In our working example, we can delete {c }(one element of the ﬁrst diagnosis ∆ = {c , c }) from the set C R of diagnosable elements and restart the algorithm for ﬁnding anotherminimal diagnosis for the CR diagnosis problem ({c , c }, C KB ).Since AC - {c } is inconsistent, we can conclude that C R = {c ,c } includes another minimal diagnosis ( ∆ = {c , c }) which is de-termined by F AST D IAG for the CR diagnosis problem (C R - {c },C KB ). Finally, we have to check whether the CR diagnosis problem({c , c }, C KB ) leads to another minimal diagnosis. This is the case,i.e., we have identiﬁed the last minimal diagnosis which is ∆ = {c ,c }. The calculation of all diagnoses in our working example on thebasis of F AST D IAG is depicted in Figure 3.Note that for a given set of constraints (C) F

AST D IAG always cal-culates the preferred diagnosis in terms of Deﬁnition 7. If ∆ is thediagnosis returned by F AST D IAG and we delete one element from Typically a CR diagnosis problem has more than one related diagnosis. igure 1.

HSDAG (Hitting Set Directed Acyclic Graph) [24] for the CR diagnosis problem (C R ={c , c , c }, C KB ={c , c , c , c }). The sets {c , c }, {c ,c }, and {c , c } are the minimal diagnoses – the conﬂict sets CS , CS , and CS are determined on the basis of Q UICK X PLAIN [18].

Figure 2. F AST D IAG execution trace for the CR diagnosis problem (C R ={c , c , c }, C KB ={c , c , c , c }). The enumerations 1–6 show the order in whichthe different incarnations of the FD function (procedure) are activated. ∆ (e.g., c ), then F AST D IAG returns the preferred diagnosis for theCR diagnosis problem ({c , c , c }-{c }, {c , ..., c }) which is ∆ in our example case, i.e., ∆ > lex ∆ . Consequently, diagnoses partof one path in the search tree (such as ∆ and ∆ in Figure 3) arein a strict preference ordering. However, there is only a partial order between diagnoses in the search tree in the sense that a diagnosis atlevel k is not necessarily preferable to a diagnosis at level k +1. F AST D IAG

Properties.

A detailed listing of the basic operationsof F

AST D IAG is shown in Algorithm 1. First, the algorithm checkswhether the constraints in C contain a diagnosis, i.e., whether AC -C is consistent – the function assumes that it is activated in the casethat AC is inconsistent. If AC - C is inconsistent or C = ∅ , F AST -D IAG returns the empty set as result (no solution can be found – line2 of the algorithm). If at least one diagnosis is contained in the setof constraints C, F

AST D IAG activates the FD function (procedure)which is in charge of retrieving a preferred diagnosis (line 3 of thealgorithm). F

AST D IAG follows a divide-and-conquer strategy wherethe recursive function FD divides the set of constraints (in our casethe elements of C R ) into two different subsets (C and C ) (line 8of the algorithm) and tries to ﬁgure out whether C already containsa diagnosis (line 5 of the algorithm). If this is the case, F AST D IAG does not further take into account the constraints in C . If only oneelement is remaining in the current set of constraints C and the cur-rent set of constraints in AC is still inconsistent, then the element inC is part of a minimal diagnosis (line 6 of the algorithm). F AST D IAG is complete in the sense that if C contains exactly one minimal di-agnosis then FD will ﬁnd it. If there are multiple minimal diagnosesthen one of them (the preferred one – see Deﬁnition 7) is returned.The recursive function FD is triggered if AC-C is consistent and Cconsists of at least one constraint. In such a situation a corresponding minimal diagnosis can be identiﬁed. If we assume the existence of aminimal diagnosis ∆ that can not be identiﬁed by F AST D IAG , thiswould mean that there exists at least one constraint c a in C whichis part of the diagnosis but not returned by FD. The only way inwhich elements can be deleted from C (i.e., not included in a diag-nosis) is by the return ∅ statement in FD and ∅ is only returned inthe case that AC is consistent which means that the elements of C (C ) from the previous FD incarnation are not part of the preferreddiagnosis. Consequently, it is not possible to delete elements from Cwhich are part of the diagnosis. F AST D IAG computes only minimaldiagnoses in the sense of Deﬁnition 5. If we assume the existenceof a non-minimal diagnosis ∆ calculated by F AST D IAG , this wouldmean that there exists at least one constraint c a with ∆ - {c a } is stilla diagnosis. The only situation in which elements of C are added to adiagnosis ∆ is if C itself contains exactly one element. If C containsonly one element (let us assume c a ) and AC is inconsistent (in thefunction FD) then c a is the only element that can be deleted fromAC, i.e., c a must be part of the diagnosis. Performance of F

AST D IAG . In this section we will compare theperformance of F

AST D IAG with the performance of the hitting setalgorithm [24] in combination with the Q

UICK X PLAIN conﬂict de-tection algorithm introduced in [18].The worst case complexity of F

AST D IAG in terms of the numberof consistency checks needed for calculating one minimal diagno-sis is 2 d · log ( nd )+2 d , where d is the minimal diagnoses set size and n is the number of constraints (in C). The best case complexity islog ( nd )+2 d. In the worst case each element of the diagnosis is con-tained in a different path of the search tree: log ( nd ) is the depth of the igure 3. F AST D IAG : calculating the complete set of minimal diagnoses . The enumerations 1–6 show the order in which the different incarnations of theF

AST D IAG algorithm are activated. path, 2 d represents the branching factor and the number of leaf-nodeconsistency checks. In the best case all elements of the diagnosis arecontained in one path of the search tree.The worst case complexity of Q UICK X PLAIN in terms of con-sistency checks needed for calculating one minimal conﬂict set is2 k · log ( nk )+2 k where k is the minimal conﬂicts set size and n isagain the number of constraints (in C) [18]. The best case complex-ity of Q UICK X PLAIN in terms of the number of consistency checksneeded is log ( nk )+2 k [18]. Consequently, the number of consistencychecks per conﬂict set (Q UICK X PLAIN ) and the number of consis-tency checks per diagnosis (F

AST D IAG ) fall into a logarithmic com-plexity class.Let n cs be the number of minimal conﬂict sets in a constraint setand n diag be the number of minimal diagnoses, then we need n diag FD calls (see Algorithm 1) plus n cs additional consistency checksand n cs activations of Q UICK X PLAIN with n diag additional consis-tency checks for determining all diagnoses. The results of a perfor-mance evaluation of F

AST D IAG are depicted in the Figures 4–7. Thebasis for these evaluations was the bicycle conﬁguration knowledgebase taken from the CLib conﬁguration benchmarks library (34 vari-ables and about 65 constraints). For this example knowledge basewe randomly generated different sets of requirements (of cardinality5,7,10, and all possible requirements) and measured the performanceof calculating corresponding diagnosis sets (the ﬁrst diagnosis, ﬁrst 5diagnoses, ﬁrst 10 diagnoses, and all diagnoses). The runtime perfor-mance of the different diagnosis algorithms and the needed amountof TP calls is shown in the Figures 4–7. As solver we used the CLibbased decision diagram represenation which allows for backtracking-free solution search. The tests have been executed on a standard desk-top computer ( Intel(R) Core(TM)2 Quad CPU QD9400

CPU with and

RAM). Note that we have evaluated the perfor-mance of F

AST D IAG with different other benchmark conﬁgurationknowledge bases on the CLib web page with basically the same re-sult. F

AST D IAG shows to be a valuable alternative for determiningdiagnoses in interactive settings especially for calculating the pre-ferred ﬁrst-n solutions.Figure 4 shows a comparison between the hitting set based diag-nosis approach (denoted as HSDAG) and the F

AST D IAG algorithm(denoted as F

AST D IAG ) in the case that only one diagnosis is calcu-lated. F

AST D IAG clearly outperforms the HSDAG approach inde-pendent of the way in which diagnoses are calculated (breadth-ﬁrstor best-ﬁrst). Figure 5 shows the performance evaluation for calcu-lating the topmost-5 minimal diagnoses . The result is similar to theone for calculating the ﬁrst diagnosis, i.e., F

AST D IAG outperformsthe two HSDAG versions. Our evaluations show that F

AST D IAG isvery efﬁcient in calculating preferred minimal diagnoses. Empirical Evaluation.

Based on a computer conﬁguration datasetof the Graz University of Technology (N = 415 conﬁgurations) weevaluated the three presented approaches w.r.t. their capability of pre-dicting diagnoses that are acceptable for the user (diagnoses leadingto selected conﬁgurations). Each entry of the dataset consists of a setof initial user requirements C R inconsistent with the conﬁgurationknowledge base C KB and the conﬁguration which had been ﬁnallyselected by the user. Since the original requirements stored in thedataset are inconsistent with the conﬁguration knowledge base, wecould determine those diagnoses that indicated which minimal setsof requirements have to be deleted in order to be able to ﬁnd a solu-tion.We evaluated the prediction accuracy of the three diagnosis ap-proaches ( HSDAG breadth-ﬁrst , F

AST D IAG , and

HSDAG best-ﬁrst ).First, we measured the distance between the predicted position of adiagnosis leading to a selected conﬁguration and the expected po-sition of the diagnosis (which is 1). This distance was measuredin terms of the root mean square deviation – RMSD (see Formula1). Table 1 depicts the results of this ﬁrst analysis. An importantresult is that F

AST D IAG has the lowest RMSD value (0.95). Best-ﬁrst HSDAG has a similar prediction quality (RMSD = 0.97). Fi-nally, breadth-ﬁrst HSDAG has the worst prediction quality (RMSD= 1.64).

RMSD = (cid:118)(cid:117)(cid:117)(cid:116) n n (cid:88) ( predicted position − (1)RMSD is an often used quality estimate but it provides only a lim-ited view on the precision of a (diagnosis) prediction. Therefore wewanted to analyze the precision of the diagnosis selection strategiesdiscussed in this paper – a measure for the precision of a diagnosisalgorithm is depicted in Formula 2. The idea behind this measure isto describe how often a diagnosis that leads to a selected conﬁgura-tion (selected by the user) is among the topmost-n ranked diagnoses.As shown in Table 2, F AST D IAG and best-ﬁrst HSDAG have highestprediction accuracy in terms of precision whereas the breadth-ﬁrstHSDAG approach shows the worst precision. precision = | correctly predicted diagnoses || predicted diagnoses | (2)We applied a Mann-Whitney-U-Test in order to statistically ana-lyze differences between the three diagnosis approaches in terms ofranking behavior. We conducted a pair wise comparison between thediagnosis approaches on the basis of the mentioned Mann-Whitney-U-Test. We could identify a signiﬁcant difference between the rank-ings of best-ﬁrst HSDAG and breadth-ﬁrst HSDAG based diagnosis( p = 6 . e − ) and also between F AST D IAG and breadth-ﬁrst HS- igure 4.

Calculating the ﬁrst minimal diagnosis with F

AST D IAG vs. hitting set based diagnosis on the basis of Q

UICK X PLAIN for 5, 7, 10, and 15 userrequirements ( req ): performance in msec on the lhs and number of needed TP calls on the rhs . Figure 5.

Calculating the topmost-5 minimal diagnoses with F

AST D IAG vs. hitting set based diagnosis on the basis of Q

UICK X PLAIN for 5, 7, 10, and 15 userrequirements ( req ): performance in msec on the lhs and number of needed TP calls on the rhs . DAG based diagnosis ( p < . e − ). There was no signiﬁcant dif-ference between best-ﬁrst HSDAG and F AST D IAG in terms of rank-ing behavior ( p = 0 . ). Knowledge Base Analysis . The authors of [10] introduce an al-gorithm for the automated debugging of conﬁguration knowledgebases. The idea is to combine a conﬂict detection algorithm such asQ

UICK X PLAIN [18] with the hitting set algorithm used in model-based diagnosis (MBD) [24] for the calculation of minimal diag-noses. In this context, conﬂicts are induced by test cases (examples)that, for example, stem from previous conﬁguration sessions, havebeen automatically generated, or have been explicitly deﬁned by do-main experts. Further applications of MBD in constraint set debug-ging are introduced in [9] where diagnosis concepts are used to iden-tify minimal sets of faulty transition conditions in state charts andin [12] where MBD is applied for the identiﬁcation of faulty utilityconstraint sets in the context of knowledge-based recommendation.In contrast to [10, 9, 12], our work provides an algorithm that al-lows to directly determine diagnoses without the need to determinecorresponding conﬂict sets. F

AST D IAG can be applied in knowledgeengineering scenarios for calculating preferred diagnoses for faultyknowledge bases given that we are able to determine reasonable or-dering for the given set of constraints – this could be achieved, forexample, by the application of corresponding complexity metrics [4].

Conﬂict Detection . In contrast to the algorithm presented in thispaper, calculating diagnoses for inconsistent requirements typicallyrelies on the existence of (minimal) conﬂict sets. A well-known al-gorithm with a logarithmic number of consistency checks dependingon the number of constraints in the knowledge base and the cardi-nality of the minimal conﬂicts – Q

UICK X PLAIN [18] – has made amajor contribution to more efﬁcient interactive constraint-based ap- plications. Q

UICK X PLAIN is based on a divide-and-conquer strat-egy. F

AST D IAG relies on the same principle of divide-and-conquerbut with a different focus, namely the determination of minimal diag-noses. Q

UICK X PLAIN calculates minimal conﬂict sets based on theassumption of a linear preference ordering among the constraints.Similarly – if we assume a linear preference ordering of the con-straints in C – F

AST D IAG calculates preferred diagnoses.

Interactive Settings . Note that in the interactive conﬁguration sce-nario discussed in this paper our goal was to support open conﬁg-uration which lets the user explore the conﬁguration space wherethe system proactively points out inconsistent requirements – such afunctionality is often provided by commercial conﬁguration environ-ments. The authors of [23] focus on interactive settings where usersof constraint-based applications are confronted with situations whereno solution can be found. In this context, [23] introduce the conceptof minimal exclusion sets which correspond to the concept of min-imal diagnoses as deﬁned in [24]. As mentioned, the major focusof [23] are settings where the proposed algorithm supports users inthe identiﬁcation of acceptable exclusion sets. The authors proposean algorithm (representative explanations) that helps to improve thequality of the presented exclusion set (in terms of diversity) and thusincreases the probability of ﬁnding an acceptable exclusion set for theuser. Our diagnosis approach calculates preferred diagnoses in termsof a predeﬁned ordering of the constraint set. Thus – compared to thework of [23] – we follow a different approach in terms of focusingmore on preferences than on the degree of representativeness.

Diagnosis Algorithms . There are a couple of algorithms that helpto improve the efﬁciency of diagnosis determination – they are fur-ther developments of the original algorithm introduced by Reiter[24]. These approaches focus on making the construction of hittingsets more efﬁcient. Wotawa [29] introduces an algorithm that reducesthe number of subset checks compared to the original HSDAG ap-proach [24]. Fijany et al. [13] introduce an approach to represent igure 6.

Calculating the topmost- 10 minimal diagnoses with F

AST D IAG vs. hitting set based diagnosis on the basis of Q

UICK X PLAIN for 5, 7, 10, and 15user requirements ( req ): performance in msec on the lhs and number of needed TP calls on the rhs . Figure 7.

Calculating all minimal diagnoses with F

AST D IAG vs. hitting set based diagnosis on the basis of Q

UICK X PLAIN for 5, 7, 10, and 15 user require-ments ( req ): performance in msec on the lhs and number of needed TP calls on the rhs . the problem of determining minimal hitting sets as a correspondinginteger programming problem. Further approaches to optimize thedetermination of hitting sets are discussed in [20]. All the mentionedapproaches rely on (minimal) conﬂict sets which are the basis for cal-culating a set of minimal diagnoses, whereas F AST D IAG is a com-plete and minimal diagnosis algorithm without the need of conﬂictsets. It is important to mention that especially when calculating theﬁrst n-diagnoses (for n > 1, i.e., not a single diagnosis), F

AST D IAG can also exploit the mentioned algorithms of [20, 29] for the calcu-lation of more than one diagnosis, i.e., it is not bound to the usage ofthe original HSDAG algorithm. Lin et al. [19] introduce an approachto determine hitting sets on the basis of genetic algorithms; a sim-ilar approach to the determination of diagnoses is presented in [8]who introduce a stochastic fault diagnosis algorithm which is basedon greedy stochastic search. Such approaches show to signiﬁcantlyimprove search performance, however, there is no general guaranteeof completeness and diagnosis minimality. Finally, there exist a cou-ple of algorithms that are improving the algorithmic performance ofdiagnosis calculation due to additional knowledge about the struc-tural properties of the diagnosis problem. For example, [17] showthe determination of (minimal) diagnoses for the case of conjunctivequeries on database tables (the set of diagnoses can be precompiledby executing the individual parts of the query on the given dataset);Siddiqi et al. [25] show one approach to exploit structural propertiesof system descriptions to improve the overall performance of diagno-sis determination – in this case, cones are areas in a gate with a cer-tain structure and a certain probability of including a diagnosis – thesearch process focuses on exactly those areas. F

AST D IAG does notexploit speciﬁc properties of the underlying constraint set, however,taking into account such properties can further improve the perfor-mance of the algorithm – corresponding evaluations are within thescope of future work.

Personalized Diagnosis . Many of the existing diagnosis ap- proaches do not take into account the need for personalizing the set ofdiagnoses to be presented to a user. Identifying diagnoses of interestin an efﬁcient manner is a clear surplus regarding the acceptance ofthe underlying application, for example, users of a conﬁgurator appli-cation are not necessarily interested in minimal cardinality diagnoses[24] but rather in those that correspond to their current preferences.A ﬁrst step towards the application of personalization concepts in thecontext of knowledge-based recommendation is presented in [11].The authors introduce an approach that calculates leading diagnoseson the basis of similarity measures used for determining n-nearestneighbors. A general approach to the identiﬁcation of preferred di-agnoses is introduced in [5] where probability estimates are used todetermine the leading diagnoses with the overall goal to minimizethe number of measurements needed for identifying a malfunction-ing device. Basic principles of determining diagnoses in knowledge-based recommendation scenarios are discussed in [17]. Furthermore,[16] introduce a logical characterization of preferences which are ex-pressed as preference relations on single diagnoses and modal logicalformulas on groups of diagnoses. In contrast to our work, [16] do notprovide an algorithm to efﬁciently calculate preferred diagnoses. Wesee our work as a major contribution in this context since F

AST D IAG helps to identify leading diagnoses more efﬁciently – further empiri-cal studies in different application contexts are within the major focusof our future work.

In this paper we have introduced a new diagnosis algorithm(F

AST D IAG ) which allows the efﬁcient calculation of one diagno-sis at a time with logarithmic complexity in terms of the number ofconsistency checks. Thus, the computational complexity for the cal-culation of one minimal diagnosis is equal to the calculation of oneminimal conﬂict set in hitting set based diagnosis approaches. Thereadth-ﬁrst (HSDAG) F

AST D IAG best-ﬁrst (HSDAG)1.64 0.95 0.97

Table 1.

Root Mean Square Deviation (RMSD) of the diagnosis approaches. top-n diagnoses breadth-ﬁrst (HSDAG) F

AST D IAG best-ﬁrst (HSDAG)n=1 0.51 0.70 0.74n=2 0.75 0.88 0.89n=3 0.87 0.97 0.96

Table 2.

Precision of F

AST D IAG vs. HSDAG based approaches. algorithm is especially applicable in settings where the number ofconﬂict sets is equal to or larger than the number of diagnoses, or insettings where preferred (leading) diagnoses are needed. Issues forfuture work are the determination of repair actions for diagnoses, thefurther development of F

AST D IAG for supporting anytime diagno-sis tasks, and the conduction of further empirical studies in differentconﬁgurator application domains.

References [1] L. Ardissono, A. Felfernig, G. Friedrich, D. Jannach, G. Petrone,R. Schaefer, and M. Zanker. A Framework for the development of per-sonalized, distributed web-based conﬁguration systems.

AI Magazine ,24(3):93–108, 2003.[2] F. Belanger. A conjoint analysis of online consumer satisfaction.

Jour-nal of Electronic Commerce Research , 6:95–111, 2005.[3] L. Castillo, D. Borrajo, and M. Salido.

Planning, Scheduling and Con-straint Satisfaction: From Theory to Practice . IOS Press, 2005.[4] Z. Chen and C. Suen. Measuring the complexity of rule-based expertsystems.

Expert Systems with Applications , 7(4):467–481, 2003.[5] J. DeKleer. Using crude probability estimates to guide diagnosis.

AIJournal , 45(3):381–391, 1990.[6] J. DeKleer, A. Mackworth, and R. Reiter. Characterizing diagnoses andsystems.

AI Journal , 56(2–3):197–222, 1992.[7] J. DeKleer and B. Williams. Diagnosing multiple faults.

AI Journal ,32(1):97–130, 1987.[8] A. Feldman, G. Provan, and A. Gemund. Computing minimal di-agnoses by greedy stochastic search. In

Proceedings of the 23rdAAAI Conference on Artiﬁcial Intelligence (AAAI’08) , pages 911–918,Chicago, IL, 2008.[9] A. Felfernig, G. Friedrich, K. Isak, K. Shchekotykhin, E. Teppan, andD. Jannach. Automated debugging of recommender user interface de-scriptions.

Journal of Applied Intelligence , 31(1):1–14, 2007.[10] A. Felfernig, G. Friedrich, D. Jannach, and M. Stumptner. Consistency-based diagnosis of conﬁguration knowledge bases.

AI Journal ,152(2):213–234, 2004.[11] A. Felfernig, G. Friedrich, M. Schubert, M. Mandl, M. Mairitsch, andE. Teppan. Plausible repairs for inconsistent requirements. In ,pages 791–796, Pasadena, CA, 2009.[12] A. Felfernig, G. Friedrich, E. Teppan, and K. Isak. Intelligent debug-ging and repair of utility constraint sets in knowledge-based recom-mender applications. In , pages 218–226, Canary Islands, Spain,2008.[13] A. Fijany and F. Vatan. New approaches for efﬁcient solutions of hittingset problems. In

International Symposium of Information and Commu-nication Technologies , pages 1–10, Cancun, Mexico, 2004.[14] G. Fleischanderl, G. Friedrich, A. Haselboeck, H. Schreiner, andM. Stumptner. Conﬁguring large systems using generative constraintsatisfaction.

IEEE Intelligent Systems , 13(4):59–68, 1998.[15] G. Friedrich and K. Shchekotykhin. A general diagnosis method forontologies. In ,number 3729 in Lecture Notes in Computer Science, pages 232–246,Galway, Ireland, 2005. Springer.[16] P. Froehlich, W. Nejdl, and M. Schroeder. A formal semantics for pref-erences and strategies in model-based diagnosis. In , pages 106–113, 1994. [17] D. Jannach and J. Liegl. Conﬂict-directed relaxation of constraints incontent-based recommender systems. In

IEA/AIE 2006 , pages 819–829, Annency, France, 2006.[18] U. Junker. Quickxplain: Preferred explanations and relaxations forover-constrained problems. In , pages 167–172, San Jose, CA, 2004.[19] L. Lin and Y. Jiang. Computing minimal hitting sets with genetic algo-rithms.

Algorithmica , 32(1):95–106, 2002.[20] L. Lin and Y. Jiang. The computation of hitting sets: review and newalgorithm.

Information Processing Letters , 86:177–184, 2003.[21] J. Marques-Silva and K. Sakallah. Grasp: A new search algorithm forsatisﬁability. In

International Conference on Computer-Aided Design ,pages 220–227, Santa Clara, CA, 1996.[22] S. Mittal and F. Frayman. Towards a generic model of conﬁgurationtasks. In , pages 1395–1401, Detroit, MI, 1989.[23] Barry O’Sullivan, A. Papdopoulos, B. Faltings, and P. Pu. Represen-tative explanations for over-constrained problems. In , pages 323–328, Van-couver, Canada, 2007.[24] R. Reiter. A theory of diagnosis from ﬁrst principles.

AI Journal ,23(1):57–95, 1987.[25] S. Siddiqi and J. Huang. Hierarchical diagnosis of multiple faults.In , pages 581–586, Hyderabad, India, 2007.[26] C. Sinz and A. Haag. Conﬁguration.

IEEE Intelligent Systems ,22(1):78–90, 2007.[27] E. Tsang.

Foundations of Constraint Satisfaction . Ac. Press, 1993.[28] D. Winterfeldt and W. Edwards. Decision analysis and behavioral re-search.

Cambridge University Press , 1986.[29] F. Wotawa. A variant of reiter’s hitting-set algorithm.