Recommender Systems for Configuration Knowledge Engineering
Alexander Felfernig, Stefan Reiterer, Martin Stettinger, Florian Reinfrank, Michael Jeran, Gerald Ninaus
aa r X i v : . [ c s . I R ] F e b Recommender Systems for Configuration Knowledge Engineering ∗ A. Felfernig, S. Reiterer, M. Stettinger, F. Reinfrank, M. Jeran, and G. Ninaus
Graz University of TechnologyInffeldgasse 16b, A-8010 Graz, Austria { felfernig,reiterer,stettinger,reinfrank,jeran,ninaus } @ist.tugraz.at Abstract
The knowledge engineering bottleneck is still a ma-jor challenge in configurator projects. In this pa-per we show how recommender systems can sup-port knowledge base development and maintenanceprocesses. We discuss a couple of scenarios forthe application of recommender systems in knowl-edge engineering and report the results of empiricalstudies which show the importance of user-centeredconfiguration knowledge organization.
Product knowledge changes frequently [Soloway, 1987].Therefore, it must be possible to conduct knowledge basedevelopment and maintenance operations efficiently. Sincethe early developments of configurator applications in thelate 1970’s and early 1980’s [McDermott, 1982], knowl-edge representations have been improved in terms of (1) model-based approaches which allow a clear separationof domain knowledge and problem solving algorithms,(2) higher-level knowledge representations which allow acomponent-oriented representation of configuration knowl-edge (see, e.g., [Stumptner et al. , 1998]), and (3) graphi-cal knowledge representations (e.g., [Felfernig et al. , 2000;Felfernig et al. , 2001]) which allow a compact representa-tion. In addition to new knowledge representations, intelli-gent diagnosis approaches have been developed which helpa knowledge engineer to identify and repair erroneous con-figuration knowledge [Junker, 2004; Felfernig et al. , 2004;Felfernig et al. , 2009; Felfernig et al. , 2013].Due to diversification strategies of companies, productand service assortments are becoming increasingly large andcomplex [Huffman and Kahn, 1998]. The complexity of theunderlying knowledge bases increases to the same extentwhich requires additional concepts that help a knowledge en-gineer to conduct knowledge base development and mainte-nance operations in an efficient fashion. Furthermore, knowl-edge bases are often developed by a group of persons withdifferent knowledge, goals, and focuses with regard to devel-opment and maintenance operations. This situation requires ∗ The work presented in this paper has been funded by the Aus-trian Research Promotion Agency (Project: ICONE (827587)). adaptive user interfaces to be integrated into configurationknowledge engineering environments. Adaptive user inter-faces for knowledge engineering have the potential to effec-tively support engineers and domain experts in activities suchas learning (knowledge base understanding), finding (the rel-evant items in the knowledge base), and testing & debugging (removing the source of faulty behavior).In order to offer more adaptivity in configurator develop-ment environments, we propose the application of differenttypes of recommendation technologies [Jannach et al. , 2010]which proactively support domain experts and engineerswhen creating and adapting configuration knowledge. Suchtechnologies should dispose of a basic understanding of cog-nitive processes when persons develop and maintain configu-ration knowledge bases. They should support functionalitiessuch as recommending relevant items (variables, componenttypes, constraints, diagnoses, etc.) and simultaneously omit-ting specific items that are not relevant. Recommender sys-tems have the potential to provide such a support (see, e.g.,[Robillard et al. , 2010]).There are three basic recommendation approaches.
First , collaborative filtering [Konstan et al. , 1997] determines rec-ommendations based on the preferences of nearest neigh-bors (users with similar preferences compared to the currentuser). In this context, items are recommended to the cur-rent user which have received a positive rating by the near-est neighbors but are not known to the current user. Sec-ond , content-based filtering [Pazzani and Billsus, 1997] rec-ommends items that are not known to the current user and aresimilar to items that have already been purchased by her/him.Similarity between items can be determined, for example,on the basis of the similarity of keywords used to describethe item. Third , knowledge-based recommenders recommenditems by using constraints or similarity metrics [Burke, 2000;Felfernig and Burke, 2008].This paper is organized as follows. In Section 2 we intro-duce example scenarios for the application of recommendertechnologies in knowledge engineering. Thereafter, we reportresults of related empirical studies (see Section 3). In Section4 we provide a discussion of related work. Conclusions and adiscussion of future research issues are given in Section 5. Recommenders for Knowledge Engineering
Collaborative Recommendation of Constraints . Collabo-rative filtering (CF) recommender systems have shown to beone of the best choices to achieve serendipity effects , i.e., tobe surprised (in a positive sense) by item recommendationsone did not expect when starting the recommendation pro-cess. In situations were knowledge engineers do not know theconfiguration knowledge base very well, collaborative recom-mendations can be exploited to support a more focused anal-ysis of the knowledge base. The availability of navigationdata from other knowledge engineers is the major precondi-tion for determining recommendations with collaborative fil-tering. Table 1 shows an example of navigation data that de-scribes in which order knowledge engineers (users) accessedthe constraints of a knowledge base. For simplicity we as-sume that each of the users accessed each constraint (but indifferent order). Similar applications of collaborative filter-ing can be imagined for the recommendation of variables (orcomponent types) and instances of a component catalog.Table 1 stores the information in which order the con-straints have been visited by knowledge engineers (users),for example, user analyzed the constraints in the order[ c , c , c , c , c , c ]. Let us assume that the current user hasalready visited the constraints c and (then) c . The nearestneighbors of the current user (users with a similar navigationbehavior) are the users , , and . The majority of theseusers analyzed constraint c in the third step – this one willbe recommended to the current user. Note that this recom-mendation approach is currently under evaluation, thereforeno related empirical results will be reported in Section 3. user c c c c c c c i ) with CF . Content-based Clustering of Constraints . Another pos-sibility to support knowledge engineers is to cluster con-straints with the goal to improve the overall clarity of theknowledge base. We will exemplify this on the basis of k-means clustering [Witten and Frank, 2005]. Following thisapproach, we have to generate k initial centroids which actas (first) representatives of future clusters. In the following,each object (in our case: constraint) is assigned to the group(cluster) with the closest (most similar) centroid. Thereafter,centroids are recalculated. In our case, a centroid is defined asthe object with the highest overall similarity to the other ob-jects in the cluster. The algorithm terminates if the centroidsare stable (do not change). k-means clustering is guaranteedto terminate but is not necessarily optimal since the outcomedepends on the initial centroids ([Witten and Frank, 2005]).For demonstration purposes we introduce the followingsimple configuration problem which is represented as a ba-sic constraint satisfaction problem (CSP = (V, D, C)) where V represents a set of variables { v , v , ..., v } , D representsthe set of corresponding domains ( dom ( v i ) = { .. } ), and C represents the following set of constraints. { c : v = 3 → v > , c : v = 3 ∧ v = 1 , c : v =2 → v = 1 , c : v = 1 → v = 1 , c : v = 1 → ( v =2 ∧ v > v ) , c : v ≥ → v ≤ , c : v = 1 → v =2 ∨ v = 3 } .On the basis of this simple knowledge base, we cancalculate the similarities between the individual constraints( c a , c b ) by using Formula 1. In this formula, V = variables ( c a ) ∪ variables ( c b ) , co − occurrence ( v, c a , c b ) = 1 if v is contained in both constraints on the sameposition, co − occurrence ( v, c a , c b ) = 0.5 if v is con-tained in both constraints but on a different position, and co − occurrence ( v, c a , c b ) = 0 of no co-occurrence exists.Note that this is one possible approach to similarity determi-nation . We also compared this approach with operator-basedsimilarity and a random assignment of constraints to clusters. sim ( c a , c b ) = P v ∈ V co–occurrence ( v, c a , c b ) | V | (1)The similarities between the pairs of individual constraintsare depicted in Table 2. c i ∈ C c c c c c c c c c c c c c c k = 2 ). The de-termination of such clusters is exemplified in Table 3. First,we (randomly) select two constraints as initial cluster cen-ters (centroids): c and c (denoted by cs ). In iteration thecenter of cluster changes to c and we have to re-calculatethe cluster assignment. After this iteration, the assignment isstable, i.e., the cluster centers ( c and c ) remain the same. iteration c c c c c c c cs ) 1 cs ) 2 2 cs ) cs ) 2 1 Table 3: k-means clustering of C = { c , c , ..., c } .For the visualization of the constraints { c , c , ..., c } thismeans that the knowledge base would be presented in termsof two constraint groups: { c , c , c , c , c } and { c , c } . Knowledge-based Refactoring Recommendations . Theway in which semantics is expressed has an impact on theunderstandability of the knowledge base. For example, usersneed less time to understand the semantics of a knowledgease if implications are expressed in terms of A → B com-pared to the alternative representation of ¬ A ∨ B . Explicitknowledge about the cognitive complexity of constraint rep-resentations can be exploited to recommend structural andsemantics-preserving adaptations of knowledge structures.Such recommendations are knowledge-based, since they areexplicitly encoded in refactoring rules. For the content-based clustering of constraints and knowledge-based refactoring recommendations we nowpresent the results of two empirical studies. In the first study,we compared the applicability of three different clusteringstrategies with regard to knowledge engineering tasks ( find asolution , find a minimal conflict ) (see, e.g., [Junker, 2004]). Study A: Clustering of Constraints . For two differentconfiguration knowledge bases ( kba , kba ) we conducted astudy based on an within-subjects design (N=40). Each studyparticipant (students of computer science who visited a re-lated course on knowledge engineering) had the task of (1)finding a solution (in kba ) and (2) finding a minimal conflict(in kba ). There were no time limits regarding task com-pletion. Each student was assigned to one type of cluster-ing (one out of variable-based similarity, operator-based sim-ilarity, and random clustering), i.e., we did not vary the typeof clustering per student. The knowledge bases ( kba , kba )were defined as CSPs in a domain-independent fashion in or-der to avoid an additional cognitive complexity related to theunderstanding of a product domain. The basic properties ofthe used knowledge bases are summarized in Table 4.Knowledge base v i ∈ V ) v i domain size c i ∈ C ) kba kba
10 3 10Table 4: Knowledge bases used in
Study A .The outcome of this experiment is shown in Table 5.
Grouping approach kba : SOL kba : CON Similar variables 21.43% 42.86%Similar operators 30.77% 53.85%Random 38.46% 76.92%Table 5: Error rates for completing the tasks find a solution(SOL) and find a conflict (CON) depending on clustering ap-proach (variable-based, operator-based, or random).From the three compared approaches to the clustering ofconstraints in a configuration knowledge base, variable sim-ilarity based clustering clearly outperforms operator-basedclustering and random clustering of constraints.
Study B: Cognitive Complexities . There are differentpossibilities to represent equivalent semantics on the basis ofa constraint, for example, the requires relationship X → Y We used these tasks to measure knowledge understanding. Fur-ther more differentiated tasks are within the scope of future work. can be represented in terms of ¬ X ∨ Y . The incompatibil-ity relationship ¬ ( X ∧ Y ) can be represented as X → ¬ Y .Table 6 depicts five different possibilities to express requires and incompatibility relationships. Requires IncompatibilityX → Y X → ¬ Y ¬ X ∨ Y ¬ X ∨ ¬ Y ¬ Y → ¬ X Y → ¬ X ¬ ( X ∧ ¬ Y ) ¬ ( X ∧ Y ) Y ← X ¬ Y ← X Table 6: Five different possibilities of representing requires and incompatibility relationships.Study B is based on an within-subjects design (N=66) withtwo configuration knowledge bases. Knowledge base kbb consisted of a set of requires constraints and kbb consistedof a set of incompatibility constraints. Each study partici-pant (again, computer science students who visited a relatedknowledge engineering course) had the task of finding a solu-tion for the given CSP. Each participant was confronted withone version of kbb and one version of kbb conform theschema depicted in Table 6. For example, if a student re-ceived the X → Y version of kbb then she/he also receivedthe X → ¬ Y version of kbb . The knowledge bases kbb and kbb were (again) defined in a domain-independent fash-ion (see Study A). The basic properties of the used knowledgebases are summarized in Table 7.Knowledge base v i ∈ V ) v i domain size c i ∈ C ) kbb kbb Study B .The outcome of this experiment is shown in Table 8. kbb : SOL errors kbb : SOL errors X → Y X → ¬ Y ¬ X ∨ Y ¬ X ∨ ¬ Y ¬ Y → ¬ X Y → ¬ X ¬ ( X ∧ ¬ Y ) ¬ ( X ∧ Y ) Y ← X ¬ Y ← X SOL ) depend-ing on constraint representation.A result of the study is that basic implications ( → ) shouldbe preferred to other representations in order to maximize un-derstandability. The only type of knowledge representationwith a similar performance is the reverse implication, how-ever, when comparing both alternatives, the standard impli-cation seems to be the better choice. There is a long history of research on the improvement ofknowledge engineering processes. Early research focused onodel-based knowledge representations that allowed a sep-aration of domain and problem solving knowledge. An ex-ample of such a representation are constraint technologieswhich became extremely popular as a technological basisfor industrial applications [Freuder, 1997]. In a next step,graphical knowledge representations [Felfernig et al. , 2000]and intelligent techniques for knowledge base testing and de-bugging have been developed [Felfernig et al. , 2004]. Theneed of an intuitive access to a corpus of software artifactsis also one of the major requirements for software com-prehension [Storey, 2006]. In this context, recommendersystems [Jannach et al. , 2010] have already been identifiedas a valuable means to provide intelligent support for thenavigation in large and complex software spaces (see, e.g.,[Robillard et al. , 2010]). The application of recommenda-tion technologies for supporting knowledge engineering pro-cesses is a new research area. Research contributions inthis field have the potential to significantly improve theoverall quality of knowledge engineering processes. In[Felfernig et al. , 2010] basic knowledge representations arecompared, for example, the use of → to represent an impli-cation vs. the use of ¬ and ∨ . This work is an importantstep towards a discipline of empirical knowledge engineer-ing with a clear focus on usability aspects and cognitive ef-forts needed to complete knowledge engineering tasks. Thework presented in this paper is a continuation of the work of[Felfernig et al. , 2010]. It takes a more detailed look at dif-ferent alternative representations of requires and incompati-bility relationships and introduces a new concepts related tothe content-based clustering of constraints. In this paper we showed how recommenders can be exploitedto support knowledge engineering tasks. Examples are col-laborative filtering of constraint sets, clustering of constraints,and knowledge-based recommendation of refactoring oper-ations. Future work will include the development of fur-ther recommendation algorithms, for example, the inclusionof content-based filtering and further clustering algorithmsas well as further empirical studies with more differentiatedmaintenance tasks. Finally, we will focus on an in-depth anal-ysis of existing research in the area of cognition psychologywhich can further advance the state of the art in (configura-tion) knowledge engineering.
References [Burke, 2000] R. Burke. Knowledge-based recommendersystems.
Library and Inf. Systems , 69(32):180–200, 2000.[Felfernig and Burke, 2008] A. Felfernig and R. Burke.Constraint-based recommender systems: Technologiesand research issues. In
ACM International Conference onElectronic Commerce (ICEC08) , pages 17–26, 2008.[Felfernig et al. , 2000] A. Felfernig, G. E. Friedrich, andD. Jannach. UML as Domain Specific Language for theConstruction of Knowledge-based Configuration Systems.
IJSEKE , 10(4):449–469, 2000. [Felfernig et al. , 2001] A. Felfernig, G. Friedrich, andD. Jannach. Conceptual modeling for configuration ofmass-customizable products.
Artificial Intelligence in En-gineering , 15(2):165–176, 2001.[Felfernig et al. , 2004] A. Felfernig, G. Friedrich, D. Jan-nach, and M. Stumptner. Consistency-based diagnosisof configuration knowledge bases.
Artificial Intelligence ,152(2):213–234, 2004.[Felfernig et al. , 2009] A. Felfernig, G. Friedrich, M. Schu-bert, M. Mandl, M. Mairitsch, and E. Teppan. Plausiblerepairs for inconsistent requirements. In , pages791–796, Pasadena, CA, 2009.[Felfernig et al. , 2010] A. Felfernig, M. Mandl, A. Pum, andM. Schubert. Empirical knowledge engineering: Cogni-tive aspects in the development of constraint-based rec-ommenders. In
IEA/AIE 2010 , pages 631–640, Cordoba,Spain, 2010.[Felfernig et al. , 2013] A. Felfernig, M. Schubert, and S. Re-iterer. Personalized Diagnosis for Over-Constrained Prob-lems. In , Peking, China, 2013.[Freuder, 1997] E. Freuder. In pursuit of the holy grail.
Con-straints , 2(1):57–61, 1997.[Huffman and Kahn, 1998] C. Huffman and B. Kahn. Va-riety for Sale: Mass Customization or Mass Confusion.
Journal of Retailing , 74:491–513, 1998.[Jannach et al. , 2010] D. Jannach, M. Zanker, A. Felfernig,and G. Friedrich.
Recommender Systems . CUP, 2010.[Junker, 2004] U. Junker. Quickxplain: Preferred expla-nations and relaxations for over-constrained problems.In , pages 167–172, San Jose, CA, 2004.[Konstan et al. , 1997] J. Konstan, B. Miller, D. Maltz,J. Herlocker, L. Gordon, and J. Riedl. Grouplens: applyingcollaborative filtering to usenet news.
Communications ofthe ACM , 40(3):77–87, 1997.[McDermott, 1982] J. McDermott. R1: A Rule-based Con-figurer of Computer Systems.
Artificial Intelligence Jour-nal , 19:39–88, 1982.[Pazzani and Billsus, 1997] M. Pazzani and D. Billsus.Learning and revising user profiles: the identification ofinteresting websites.
Mach. Learn. , 27:313–331, 1997.[Robillard et al. , 2010] M. Robillard, R. Walker, and T. Zim-mermann. Recommendation systems for software engi-neering.
IEEE Software , 27(4):80–86, 2010.[Soloway, 1987] E. et al. Soloway. Assessing the Maintain-abiliy of XCON-in-RIME: Coping with the Problem ofvery large Rule-bases. In
Proc. of AAAI-87 , pages 824–829, Seattle, Washington, USA, July 13–17 1987.[Storey, 2006] M. Storey. Theories, tools and research meth-ods in program comprehension: past, present and future.
Software Quality Journal , 14:187–208, 2006.Stumptner et al. , 1998] M. Stumptner, G. Friedrich, andA. Haselb¨ock. Generative Constraint-based Configurationof Large Technical Systems.
AI EDAM , 12(4):307–320,1998.[Witten and Frank, 2005] I. Witten and E. Frank.