Case Level Counterfactual Reasoning in Process Mining
CCase Level Counterfactual Reasoning in Process Mining
Mahnaz Sadat Qafari and Wil van der Aalst
Rheinisch-Westf¨alische Technische Hochschule Aachen (RWTH), Aachen, Germany [email protected],[email protected]
Abstract.
Process mining is widely used to diagnose processes and uncover per-formance and compliance problems. It is also possible to see relations betweendi ff erent behavioral aspects, e.g., cases that deviate more at the beginning of theprocess tend to get delayed in the last part of the process. However, correlationsdo not necessarily reveal causalities. Moreover, standard process mining diagnos-tics do not indicate how to improve the process. This is the reason we advocatethe use of structural equation models and counterfactual reasoning . We use re-sults from causal inference and adapt these to be able to reason over event logsand process interventions. We have implemented the approach as a ProM plug-inand have evaluated it on several data sets. Our ProM plug-in produces recom-mendations that indicate how specific cases could have been handled di ff erentlyto avoid a performance or compliance problem. Keywords:
Process mining · Counterfactual statement · Classification.
Statements like ”I should have done it di ff erently.” or ”What if something had been dif-ferent?” show that, as humans, we tend to learn from past experiences and use them toadopt out behaviour.Such “what if” questions, in which the ”if” part is di ff erent fromwhat happened in the reality, are known as a counterfactual questions . Humans tendto learn from the past (their experiences) by counterfactual thinking and reflecting ontheir findings aiming for better results in future similar cases (e.g., not doing the samemistakes). Using counterfactual thinking we want to explain why it is as it is by com-paring two outcomes, one as it happened in the reality and the other one that wouldhave happened under the exact same conditions, with some minor changes.The information systems of the companies save data about the process instances(cases) in their event logs. Process mining extracts knowledge from the event logs fordiscovering the process model, monitoring process KPIs, and improving processes. Pro-cess improvement requires a deep comprehension of the process behavior and its rea-sons, not just at the process level but also int the case level.In this paper, we tailor the concept of counterfactual thinking to process miningand explain why a specific situation has a special outcome. Given a instance with anundesirable outcome (which we call it current instance ), we aim at providing a set ofcounterfactual statements(which we call them explanations ) to explain why it happens.This is an intricate task as processes are complicated. A process may involve many stepsand in each step, many factors may be of influence. Also, the steps or the order of the a r X i v : . [ c s . A I] F e b M. S. Qafari et al. steps that are taken for each case may vary. These complications make counterfactualreasoning about the processes challenging.Providing explanations at the case level has several applications: – Customer satisfaction perspective: by explaining to the customer “why he / she re-ceive a specific result”, “was it justified or discriminatory” and “what he / she cando in the next similar situation to get a di ff erent result”, companies can build trustwith their customers. It can be done without mentioning the details of the processwhich may include sensitive information of the company or may put other people’srights and privacy in danger [8]. – Process manager perspective: explaining why something happens in a specific caseand how to behave di ff erently to prevent similar undesirable results in similar casesin the future.By applying the counterfactual thinking paradigm in the area of process mining,companies can boost their interpretability and accountability. This is possible if theexplanations are accurate. Having an accurate but not actionable explanation is goodbut not satisfactory. There are often several explanations for a situation which not all ofthem applicable. The applicability of an explanation can be amended by distinguishingthe correlation and causation among the process features, which prevents explanationsthat recommend altering features with non-causal relationships with the undesirableresult. Explanations that are based on correlation and not causation can be misleadingand acting upon them may cause more damage than good in future similar situations.To overcome this hurdle, we propose using a structural equation model of the featuresin the process to anticipate the outcome of the counterfactual situation. In such a causalequation model the value of each feature is determined by the value of other features(and some independent noise).Another complication regarding the applicable explanations is that the set of action-able features is case sensitive. For example, to prevent unwanted delays in the processof doing a task, we may allocate more resources to it or we may redefine the task ina simpler manner. For two tasks A and B, even though both options are possible, theformer one is more desirable in task A and the latter one for task B. To overcome thisissue, in the proposed method, we present the user a set of diverse explanations andthe user can decide for himself / herself which one to apply. By diverse explanations, wemean a set of explanations that di ff er from the current instance in di ff erent features.As the explanations are meant to be used by the human, the readability and un-derstandability of the explanations are important. For that, each explanation needs tobe as simple as possible. Therefore the number of features with di ff erent values in theexplanation and the current instance, should be small [11].The general overview of the proposed method is presented in Figure. 1. At the firststep, we enrich the event log and then several random counterfactual instances similarto the current instance are generated (which may or may not present in the event log).Among the generated counterfactual instances, those that have a desirable outcome re-garding a given threshold are selected and optimization techniques are used to makethem as close as possible to the current instance. The resulting desirable counterfactualinstances are ordered according to their distance into the current instance, and finally,converted onto a set of explanations and presented to the people involved. ase Level Counterfactual Reasoning in Process Mining 3 Fig. 1.
The general overview of the proposed method.
The rest of the paper is organized as follows. In Section 2 a brief overview of the re-lated work is presented. In Section 3, process mining and causality theory preliminariesare presented. In Section 4, the proposed method is presented. The experimental resultsare presented in Section 5 and we discuss them in Section 6. Finally, in Section 7 theconclusion and the future works are presented.
There are already several approaches in the domain of process mining that deal withroot cause analysis. For example in [10,2] root cause analysis is done using a classifica-tion techniques. The drawback of these methods is that the classification techniques arebased on correlation and not causal relationships. Also, there are several works consid-ering causal relationships among di ff erent features in a process. For example in [4,7],the goal is to find the structural equation model of the process features at the processlevel. Also in [3], cause-e ff ect relations between a range of business process charac-teristics and process performance indicators are investigated using time series analysis.There is also existing work on finding the root of a problem at the case level. In [1], aframework for generating case-level treatment recommendations is proposed such thatthe treatment recommendations maximize the probability of a given outcome. In thismethod a subset of candidate treatments that are most correlated with the outcome isextracted by applying an association rule mining technique. Then the subgroups withcausal relation between treatment and outcome are identified using uplift tree. Finally,the subgroups are sorted by the ratio of the score associated to them by the uplift treesand their cost.It is worth noting that counterfactual reasoning for explainability has been studiedextensively in the field of data mining and machine learning (e.g., [9,11]). First, we compactly describe a few basic notations for the process mining and causalinference theory.
M. S. Qafari et al.
Process mining techniques start from an event log extracted from an information sys-tem. An event log is a collection of traces where each trace itself is a collection ofevents. An event log may include three di ff erent levels of attributes: log level attributes,trace level attributes, and event level attributes. We denote the universe of attributenames by U att , where actName , timestamp ∈ U att . actName indicates the activityname and timestamp indicates the timestamp of an event. Also, we denote the uni-verse of values by U val and define values ∈ U att (cid:55)→ P ( U val ) as a function thatreturns the set of all its possible values for a given attribute name. We define ⊥ asa member of U val such that ⊥ (cid:60) values ( at ) for all at ∈ U att . We use this sym-bol to indicate that the value of an attribute is unknown, undefined, is missing. Let U map = { m ∈ U att (cid:54)(cid:55)→ U val |∀ at ∈ dom ( m ) : m ( at ) ∈ values ( at ) } be the universe of allmappings from a set of attribute names to attribute values of the correct type. We definean event as follows: Definition 1 (Event).
Each element e ∈ U map , where e ( actName ) (cid:44) ⊥ and e ( timestamp ) (cid:44) ⊥ , is an event and E is the universe of all possible events. E + is the set of all non-empty chronologically ordered sequences of events. If (cid:104) e , . . . , e n (cid:105) ∈ E + , then for all ≤ i < j ≤ n, e i ( timestamp ) ≤ e j ( timestamp ) . Here we assume that events are unique; i. e., an event appears just once in a trace orevent log. This can be ensured by adding an extra identity attribute.We can group the events based on their properties. For at ∈ U att , and V ⊆ values ( at ),we can group the events in E for which the value of attribute at is in V as follows: group ( at , V ) = { e ∈ E| e ( at ) ∈ V } . For example, a group of events can be the set of events which are done by specificresources, start in a specific period of time during the day, with specific activity names,or with a specific duration. We denote the universe of all event groups by G = P ( E ).Based on the definition of an event we define an event log as follows: Definition 2 (Event Log).
We define U L = E + (cid:54)(cid:55)→ U map as the universe of all eventlogs. We call each element ( σ, m ) ∈ L a trace where L ∈ U L is an event log. When we need to consider features not existing in the event log, we have to enrichthe event log. It is done by adding new features to traces and events. We can compute thevalue of these derived features from the event log or possibly other sources. We can de-fine many di ff erent derived features related to any of the process perspectives; the timeperspective, the data flow-perspective, the control-flow perspective, the conformanceperspective, or the resource / organization perspective of the process.One of the main assumptions that we make while extracting the data from an eventlog is that only the features that have been recorded before the occurrence of a specific In this paper, it is assumed that the reader is familiar with sets, multi-sets, and functions. P ( X )is the set of non-empty subsets of set X (cid:44) ∅ . Let X and Y be two sets. f : X (cid:54)(cid:55)→ Y is a partialfunction. The domain of f is a subset of or equal to X which is denoted by dom ( f ). We write f ( x ) = ⊥ if x (cid:60) dom ( f ).ase Level Counterfactual Reasoning in Process Mining 5 feature can have causal e ff ect on it. So the relevant part of a trace to a given feature is aprefix of it which we call it a situation . Let prfx ( (cid:104) e , . . . , e n (cid:105) ) = {(cid:104) e , . . . , e i (cid:105)| ≤ i ≤ n } a function that return all the non empty prefixes of a given sequence, then the definitionof a situation is as follows: Definition 3 (Situation). U situation = E + × U map is the universe of all situations. Wecall each element ( σ, m ) ∈ U situation a situation . Considering G ∈ G , we define a G -based situation subset of L as S L , G = { ( (cid:104) e , . . . , e n (cid:105) , m (cid:48) ) |∃ ( σ, m ) ∈ L (cid:104) e , . . . , e n (cid:105) ∈ pr f x ( σ ) ∧ e n ∈ G } , and trace-based situation subset of L asS L , ⊥ = L. To distinguish trace and event level attributes, we use situation feature which is apair including a name and a function. The function is defined over the situations andidentified by a group of events, G (possibly G = ⊥ ), and an attribute name, at . Moreformally: Definition 4 (Situation Feature).
Let U sfn = U att × ( G ∪ {⊥} ) be the universe of the situation feature names and U sfFunction = U situation (cid:54)(cid:55)→ U val be the universe of all situ-ation feature functions . We define U sf = U sfn × U sfFunction as the universe of the situ-ation features where given at ∈ U att , and G ∈ G ∪ {⊥} , such that the situation feature(( at , G ) , ( at , G ) ) ∈ U sf then if G = ⊥ we have ( at , G ) (( σ, m )) = m ( at ) and if G ∈ G wehave ( at , G ) (( σ, m )) = e ( at ) where e = arg max e (cid:48) ∈ G ∩{ e ” ∈ σ } e (cid:48) ( timestamp ) where ( σ, m ) ∈ U situation .Moreover, we define a situation feature extraction plan as SFN ⊆ U sfn where
SFN (cid:44) ∅ . Given a situation ( σ, m ), if G = ⊥ , ( at , G ) (( σ, m )) returns the value of at in trace level(i.e. m ( at )). However, if G (cid:44) ⊥ , then ( at , G ) (( σ, m )) returns the value of at in e ∈ σ withthe maximum timestamp for which e ∈ G (i.e., an event e ∈ σ that happens last amongthose events of σ that belong to G ). Given a situation feature extraction plan SFN , wecan map each situation to a data point. We do that as follows:
Definition 5 (Instance).
Given a situation s ∈ U situation and
SFN ⊆ U sfn where
SFN (cid:44) ∅ , then the instance inst SFN ( s ) is defined as inst SFN ( s ) ∈ SFN → U val suchthat ∀ sfn ∈ SFN (cid:0) ( inst SFN ( s ))( sfn ) = sfn ( s ) (cid:1) . We denote the universe of all possibleinstances as: U instance = (cid:91) s ∈U situation (cid:91) SFN ⊆U sfn SFN (cid:44) ∅ { inst SFN ( s ) } . An instance is a set where each element of it is a pair of a situation feature name anda value. With a slight abuse of notation, given a situation feature name sfn = ( at , G ), wedefine values ( sfn ) = values ( at ).We consider one of the situation feature names in a given SFN as the target situationfeature name, denoted as tf , and the rest of them as descriptive situation feature names.Given a situation feature extraction plan SFN , tf ∈ SFN , and an event log L , we candefine a tabular data-set, which we call it situation feature table , as the bag of instancesderived from the situations in a proper situation subset of L , regarding SFN and tf .More formally; M. S. Qafari et al.
Definition 6 (Situation Feature Table).
Let L ∈ U L be an event log, SFN a situationfeature extraction plan, and tf = ( at , G ) ∈ SFN where G ∈ G ∪ {⊥} . We define a situation feature table T L , SFN , tf as follows:T L , SFN , tf = [ inst SFN ( s ) | s ∈ S L , G ] . Note that according to Definition 6, if tf = ( at , G ) where G ∈ G , then the situationfeature table T L , SFN , tf includes the instances derived from the G -based situation subset S L , G of L . However, if G = ⊥ , then it includes the situations derived from the trace-based situation subset S L , ⊥ . Having a situation feature table T L , SFN , tf , we can infer the structural equation model ofits situation feature names. It can be provided by a customer who possesses the processdomain knowledge or can be inferred using several methods that already exists in theliterature (e.g., [7,4]). Loosely speaking, a structural equation model is a data generatingmodel in the form of a set of equations. These equations are not normal equations but away to determine how to generate the observational and the interventional distributions.More formally ; Definition 7 (Structural Equation Model (SEM)).
Let T L , SFN , tf be a situation featuretable, where L ∈ U L , SFN ⊆ U sf , and tf ∈ SFN , the SEM of T L , SFN , tf is defined as EQ ∈
SFN → Expr ( SFN ) where for each sfn ∈ S F, Expr ( SFN ) is an expression overthe situation feature names in SFN and possibly some noise N sfn . Moreover, the noisedistributions of N sfn for all s f n ∈ S F have to be mutually independent.
Here, we assume that
SFN is causal su ffi cient which means SFN includes all relevantsituation feature names and there is no common hidden confounder for the situationfeature names in
SFN . Also we assume that the SEM does not include any loop whichmeans, given a SEM EQ over the SFN of a situation feature table T L , SFN , tf , for each sfn ∈ SFN , the right side of expression sfn = Expr ( SFN ) in EQ does not include sfn .Given EQ , the situation feature names that appear in the right side of expression sfn = Expr ( SFN ) are called its parents and includes those situation feature nameswith a direct causal e ff ect on it. The structure of the causal relationships between thesituation feature names in a SEM can be encoded as a directed acyclic graph, which iscalled causal structure . Given a SEM on a set of situation feature names SFN , eachvertex in its corresponding causal structure is analogous to one of the situation featurenames in
SFN and there is a directed edge from sfn to sfn if sfn appears in the rightside of expression EQ ( sfn ) where sfn , sfn ∈ SFN . As an example considered an IT company, which implements software for its cus-tomers. This company does not maintain the released software. Figure 2 shows the Definition 7 is based on [6].ase Level Counterfactual Reasoning in Process Mining 7
Petri-net model of this company. Each trace in the event log is corresponding to theprocess of implementation of one project. There is some cases for which the managerof the company believes their implementation phase should have taken less time forsome of the customers, so the target situation feature name is the duration of the im-plementation phase. The manager believes that the following attributes of a case might Fig. 2.
The process of the IT company in Section 5.1 have a causal e ff ect on the duration of its implementation phase:1. Priority, which is an attribute of feasibility study activity and indicate how urgentthe software is for the customer.2. Complexity, which is an attribute of feasibility study activity and indicate the hard-ness of a project.3. The number of employees working on that project, which is mentioned as an at-tribute of team charter activity.4. The amount of time in person-days spent on the product backlog activity for thatproject.To avoid the notionally complicated formulas, we denote the relevant situation featurenames as mentioned at follows: – Complexity: (
Complexity , G ) where G = { e ∈ E| e ( actName ) = Feasibility S tudy } – Priority: (
Priority , G ). – ProductBacklogDuration: (
Duration , G ) where G = { e ∈ E| e ( actName ) = Product Backlog } – NumEmployees : (
Number o f employees , G ) where G = { e ∈ E| e ( actName ) = ” T eamCharter ” } – ImplementationDuration : (
S ub Model Duration , ⊥ ). ImplementationDuration is the target situation feature name and
SFN = { Complexity , Priority , NumEmployees , ProductBacklogDuration , ImplementationDuration } .In table 1 we have provided two SEM for the situation feature names in SFN . Theupper one is a linear and the lower one in a non-linear SEM. The causal structure of bothof these SEMs are as depicted in Figure 3.3. According to this causal structure, spending By implementation phase, we mean the sub-model including two transitions “development”and “test” (marked with a blue rectangle in Figure 2). M. S. Qafari et al. more or less time on the product backlog does not have any e ff ect on the duration of theimplementation phase. The reason is that there is no causal relation between them, eventhough they are correlated. Linear structural equation model
Complexity = N Complexity N Complexity ∼ Uniform (1 , Priority = N sf BC , P N Priority ∼ Uniform (1 , ProductBacklogDuration = Complexity + N ProductBacklogDuration N ProductBacklogDuration ∼ Uniform ( − , NumEmployees = Complexity + Priority + N NumEmployees N NumEmployees ∼ Uniform ( − , ImplementationDuration = Complexity + NumEmployees + N ImplementationDuration N ImplementationDuration ∼ Uniform (10 , Complexity = N Complexity N Complexity ∼ Uniform (1 , Priority = N sf BC , P N Priority ∼ Uniform (1 , ProductBacklogDuration = Complexity + (cid:98) Complexity / (cid:99) + N ProductBacklogDuration N ProductBacklogDuration ∼ Uniform ( − , NumEmployees = Complexity (cid:112)
Complexity + Priority + N NumEmployees N NumEmployees ∼ Uniform ( − , ImplementationDuration = Complexity + Complexity · NumEmployees · (cid:112) NumEmployees − N ImplementationDuration ∼ Uniform (10 , NumEmployees , + · (cid:112) mod ( NumEmployees , + + N ImplementationDuration
Table 1.
Linear (top) and nonlinear (bottom) SEM for the IT company in Section 3.3.
Given a situation feature table T L , SFN , tf , we distinguish one of its instances with anundesirable value for the target situation feature name as the current instance , denotedby cinst . Moreover, using T L , SFN , tf , we can infer the SEM of the situation feature namesin SFN . We use this SEM to generate a set of counterfactual explanations for the cinst .As it is vain to investigate the e ff ect of assigning di ff erent values on those situationfeature names that their value can not be altered by the user, we study the e ff ect ofchanging the value of those situation feature names that their values are modifiableby the user. We call the set of modifiable situation feature names actionable situationfeature names and denote it with ASF .A set of counterfactual explanations which describe the reasons of the undesirableresult in the current instance is a set of alternative instances. More formally,
Definition 8 (A Set of Counterfactual Explanation).
Let
SFN ⊆ U sfn be a situationfeature extraction plane, tf ∈ SFN be the target situation feature name,
ASF ⊆ SFN ase Level Counterfactual Reasoning in Process Mining 9 where
ASF (cid:44) ∅ and tf (cid:60) ASF be a set of actionable situation feature names. Consider EQ to be a SEM over SFN and cinst : SFN → U val be the current instance. A set ofcounterfactual explanation is defined as Ex EQ , SFN , ASF , tf ( cinst ) ⊆ SFN → U val . To generate the set of counterfactual explanations for the current instance cinst ,we take the following three steps: generating candidates, predicting the value of targetsituation feature name, and selecting a subset of candidates.1.
Generating candidates.
In the first step, we generate several candidates for thevalues that could had been assigned to the actionable situation feature names of thecurrent instance. Each candidate is a member of (cid:83) A ⊆ ASF ∧ A (cid:44) ∅ A → U val . There areseveral ways to generate candidates. We do it by generating several candidates foreach nonempty subset A ⊆ ASF such that for half of the generated candidates thevalue for each situation feature name sfn ∈ A is selected from the distribution of sfn in the situation feature table and for the other half, it is selected randomly from values ( sfn ).2. Predicting the value of the target situation feature name.
In this step we com-pute the e ff ect of replacing the values of the situation feature names in the currentinstance with the values of the generated candidates in the previous step on thevalue of target situation feature name (and possibly other situation feature names).If we do not have access to the SEM of the situation features names in SFN , thenwe can use a machine learning technique, like neural network, and train it using T L , SFN , tf . Now for each candidate c generated in the previous step, we can generatean instance inst : SFN \ { tf } → U val in which inst ( sfn ) = c ( sfn ) if sfn ∈ dom ( c )and inst ( sfn ) = cinst ( sfn ) if sfn ∈ SFN \ ( dom ( c ) ∪ { tf } ). Then, we can use thetrained machine learning technique to predict the value of tf for that instance. But,if we have access to the SEM EQ , then predicting the value of tf for each candidateincludes three steps abduction , action , and prediction [5]. We explain these stepsusing an example. – Abduction. First we need to incorporates the observed data, cinst , into themodel, EQ , and generate a counterfactual SEM that explains the conditionsand the behavior of the system and the environment when cinst was happening.A counterfactual SEM , EQ (cid:48) , is obtained by replacing the distribution of noiseterms in EQ with the corresponding noise distributions condition on SFN = cinst . Example 1.
Suppose in the example mentioned in Section 3.3, we select thecurrent instance as { ( Complexity , , ( Priority , , ( NumEmployees , , ( ProductBacklogDuration , , ( ImplementationDuration , Complexity = Priority = ProductBacklogDuration = × Complexity + NumEmployees = × Complexity + × Priority + ImplementationDuration = × Complexity + × NumEmployees + – Action. The second step is taking action toward enforcing changes in the coun-terfactual SEM EQ (cid:48) , regarding candidate c . The result is a SEM EQ (cid:48)(cid:48) where sfn = c ( sfn ) if sfn ∈ dom ( c ) and sfn = EQ (cid:48) ( sfn ) where sfn (cid:60) dom ( c ). Example 2.
Suppose that we are interested in predicting the value of
Imple-mentationDuration for the candidate (
Priority , Complexity = Priority = ProductBacklogDuration = × Complexity + NumEmployees = × Complexity + × Priority + ImplementationDuration = × Complexity + × NumEmployees + – Prediction. The third step involves using the modified SEM to predict thecounterfactual value of the target situation feature name. Prediction is simplydone by computing the value of targer situation feature name (or the distribu-tion of it) in the counterfactual SEM under the intervention EQ (cid:48)(cid:48) . In this step,we remove those situation feature names from the domain of c that does notplay any role in determining the value of the target situation feature name re-garding EQ (cid:48)(cid:48) . Example 3.
After computing the values of the situation feature names, we willhave: { ( Complexity , , ( Priority , , ( NumEmployees , , ( ProductBacklogDuration , , ( ImplementationDuration , Selecting a subset of candidates.
We look for a set of close candidates as ex-planations which has four conditions. First, we need that the predicted value forthe target situation feature name in the generated instance be desirable regardinga threshold t provided by the user. We call these candidates desirable candidates .Second, the domain of each candidate has the fewest possible number of situationfeature names. Third, They are as close as possible to the current instance, accord-ing to a distance function d : U inst × U inst → R . Forth, they di ff er from each otheras much as possible regarding the set of situation feature names in their domain. Forthe distance function d ( ., . ), we use L metric on the normalized situation featurenames. As mentioned in [11], using L metric, more sparse explanations wouldbe generated. For the diversity, we partition desirable candidates based on theirdomain and then sort the desirable candidates in each partition according to their L distance to the current instance. A set of desirable candidates are selected oneby one from di ff erent partitions, with the priority of those partitions that have lessnumber of situation feature names in their domains which least number of desirablecandidates have been selected from that partition by now. ase Level Counterfactual Reasoning in Process Mining 11 We implemented the proposed method in ProM which is a free and expandable platformfor the process mining algorithms. The implemented plugin can be found under thename counterfactual explanation in ProM nightly build.In the implemented plugin we have implemented L norm on the normalized situ-ation features and L norm weighted by the inverse median absolute deviation [11] asthe distance functions to sort the desirable candidates. Also, we can apply several clas-sifier (including Regression Tree(RT), Locally Weighted Learning(LWL), Multi-layerperceptron(NN)), as well as structural equation model, to predict the value of targetsituation feature name for each candidate. For diversity and optimizing the distance ofdesirable candidates and the current instance, we have used hill climbing algorithm andimplemented a method which in a combination of both the methods mentioned in [11]and [9]. We optimize the current instance to the close desirable candidates by changingthe value of di ff erent subsets of actionable situation feature names (similar to [9]) andalso we optimize the distance of several randomly generated desirable candidates to thecurrent instance (similar to [11]).To evaluate the explanation generated using this method, we apply it on two syn-thetic event logs with known linear and nonlinear SEMs and one real event log of BPIchallenge 2017. In the following, first we describe the data generation method and theresults of applying this method on the synthetic event logs and then we discuss theresults for the real event log. The goal of applying the implemented plugin on the synthetic event logs is to see howdi ff erent might be the predicted counterfactual value of a counterfactual instance by theSEM and by a machine learning technique. In this part of the experiment we have notused optimization on the selected desirable counterfactual instances. Synthetic event logs
For the synthetic event log, we have used the example mentionedin Section 3.3. We have generated two di ff erent event logs where the duration of theimplementation phase of a case is linearly dependent to the value of other situationfeature names in the first event log and non-linearly in the second event log. The SEMof the generated event logs is presented in 1. For each event logs we have generated1000 traces. Then, using the implemented plugin, we generate a set of 8 explanationsfor both linear and nonlinear scenarios. In this experiment, ImplementationDuration isthe target situation feature name and
SFN = { Complexity , Priority , NumEmployees , ProductBacklogDuration , ImplementationDuration } .We have used the structural equation model to evaluate possible samples and selectthe desirable candidates. We have used the classifier with the highest accuracy for pre-dicting the value of ImplementationDuration on the selected desirable candidates in theprevious step and investigate how close are the predicted values by machine learningtechniques to the true values and how di ff erent are the generated explanations in termof number of situation feature names with di ff erent values in current instance and theexplanations. As the current instance and the desirable value threshold, we have used the follow-ing setting: – Linear case.
The current instance was an instance in which
Complexity = Priority = NumEmployees = ProductBacklogDuration = ImplementationDuration =
577 and the desirable threshold was 500. – Non-linear case.
The current instance was an instance in which
Complexity = Priority = NumEmployees = ProductBacklogDuration = ImplementationDuration = – Linear case. The accuracy of the machine learning techniques RT, LWL, and NNon the data were 0.818, 0.990, and 0.984 respectively. But their accuracy reducedon the counterfactual instances to 0.74, 0.77, and 0.76 respectively. – Non-linear case.
The accuracy of the machine learning techniques RT, LWL, andNN on the data were 0.986, 0.989, and 0.95 respectively. But their accuracy reducedon the counterfactual instances to 0.74, 0.25, and 0.79 respectively.The results of applying proposed method using structural equation model and threementioned machine learning techniques are presented in Figure 3. In Figure 3 a. andc., the predicted ImplementationDuration of applying the selected desirable candidatesusing SEM (red line), RT (blue line), LWL (green line), and NN (ligth green line) aredemonstrated (part a. for the linear case and part c. for the non-linear case). In Figure3 b. and d., the number of e ff ective situation feature names in the domain of selecteddesirable candidates demonstrated (part b. for the linear case and part d. for the non-linear case). For the real event log we have used BPI challenge 2017 event log. This event log in-cludes 1,202,267 events pertaining to 31,509 bank loan applications. In this event logeach trace have several attributes indicating the loan goal , requested amount , and appli-cation type . The value of application type can be either ‘New credit’ or ‘limit raising’.Also, each trace includes an event where its activity name is O Create O ff er . This eventincludes attributes that indicate the amount of the firs withdrawal , the number of terms ,the monthly cost of the o ff ered loan, and also the credit score of the loan applicant, andif the loan was selected by the loan applicant.Another event log including the information of 42,995 o ff ers created for these loanapplicants has been provided in BPI challenge 2017. We have enriched the BPI chal-lenge 2017 event log by adding an attribute, numO ff ers , to each trace which indicatesthe number of o ff ers provided for that applicant (computed from the second event log).The question is to see why some of the loans have been rejected by some of theloan applicants and was it possible to prevent rejecting the loan o ff ers. Let G = { e ∈ ase Level Counterfactual Reasoning in Process Mining 13 Fig. 3.
The result of applying implemented method on the synthetic event logs. E| e ( actName ) = O Create O ff er } . We have used ( selected , G ) as the class situationfeature where selected is encoded as 1 and rejected as 0. In this experiment we set SFN = { ( loanGoal , ⊥ ) , ( requestedAmount , ⊥ ) , ( numO ff ers , ⊥ ) , ( applicationT ype , ⊥ ) , ( creditScore , G ) , ( firstWithdrawal , G ) , ( NumberOfTerms , G ) , ( monthlyCost , G ) , ( selected , G ) } . The set of actionable situation feature names ASF is { ( numO ff ers , ⊥ ) , ( firstWithdrawal , G ) , ( NumberOfTerms , G ) , ( monthlyCost , G ) } . We have used a NN to predict the value for the target situation feature name. Here aretwo examples of the generated explanations:When the current instance is the instance corresponding to the loan applicant ‘Ap-plication 1959700739’, then we get the following explanations: – By setting (
NumberO f T erms , G ) to 30 instead of 48, the o ff er would be selected. – By setting ( f irstWithdrawal , G ) to 4586.9 instead of 5250, the o ff er would be se-lected.These o ff ers mean that probably if in the created o ff er by the bank the first withdrawamount or the number of terms were less, then the applicant would select the o ff er. Weturned the resulting counterfactual explanations into textual human readable explana-tions.As another example consider setting the current instance to the instance correspond-ing to the ‘Application 557515885’, then we get the following explanations: – By setting (
NumberO f T erms , G ) to 30 instead of 48, the o ff er would be selected. – By setting ( f irstWithdrawal , G ) to 1000 instead of 8500, the o ff er would be se-lected. – By setting ( monthlyCost , G ) 863.928 instead of 200.85 and setting ( numO f f ers , ⊥ )to 2 instead of 1, the o ff er would be selected.This explanations mean that probably an o ff er with less first withdrawal amount or lessnumber of terms were more desirable to the loan this applicant. Regarding the results demonstrated in Figures 3, we can see that there is a gap betweenthe values predicted by the machine learning techniques and the structural equationmodel. Also the accuracy of the classifiers predicting the value of the counterfactualinstances drops dramatically. This phenomenon can be explained by the di ff erence inthe mechanism of predicting counterfactual values using a structural equation modeland a machine learning technique.When evaluating an instance generated using a candidate, first using the currentinstance and the structural equation model of the process, we drive a counterfactualstructural equation model that shows how the environment e ff ects during the currentinstance (by replacing the noise terms with the noise terms driven from the currentinstance.). The next step is intervening on the counterfactual SEM as dictated by thecandidate. Finally, the value of the target situation feature name is predicted in themodified SEM. On the other hand, using a machine learning technique, neither thebehavior of the environment nor the e ff ect of an intervention is considered. The instancegenerated by the candidate and the current instance is regarded an a new instance, whichmainly results in wrong predictions. If we remove the situation feature names withnon-causal relationship with the target situation feature name from the situation featuretable, the accuracy of the classifiers on the counterfactual instances remain more or lessthe same. This indicates that the di ff erence in the predicted value for the class situationfeature comes from the di ff erence in the mechanism of the prediction.The di ff erence in the number of e ff ective situation feature names with di ff erentvalues between the current and candidate comes from the fact that machine learningtechniques do not distinguish among the situation feature names with causal and merecorrelation relationship with the target situation feature name. On the other hand, us-ing SEM the changes on the values of the situation feature names that have no causalrelationships with the target situation feature name in the counterfactual instances aresimply ignored. We have presented a method to generate case specified explanations in the context ofthe process mining. In this method, given an instance with an undesirable result for thetarget situation feature name and a threshold for the desirable result, a large numberof random candidates are generated. In the next step, the value of the target situation ase Level Counterfactual Reasoning in Process Mining 15 feature name for these instances generated by applying the changes suggested by thecandidate on the current instance is predicted. Optimization method is used to createclose desirable candidates using the current instance and also some of the generateddesirable candidates generated in the previous step. finally, a set of close and at thesame time diverse desirable candidates are selected as the set of explanations.Here we have assumed that the applicability of a situation feature is case depen-dent and is known just by the user, who the explanations are for. So, a set of diverseexplanations can help the user to decide which one works the best for him.The results of the evaluations on the synthetic event logs have shown that ignoringthe causal relationships among the situation features end up in explanations that sug-gest changing situation features with no causal e ff ect on the value of the class situationfeature. Moreover, using a machine learning technique, regardless of its accuracy, forpredicting the value of target situation feature may result in imprecise or wrong expla-nations or missing some of the good explanations.As the future work, we are going to focus on modifying this method to be used inthe online setting. References
1. Bozorgi, Z.D., Teinemaa, I., Dumas, M., Rosa, M.L., Polyvyanyy, A.: Process mining meetscausal machine learning:discovering causal rules from event logs. In: ICPM (2020)2. Ferreira, D.R., Vasilyev, E.: Using logical decision trees to discover the cause of processdelays from event logs. Computers in Industry , 194–207 (2015)3. Hompes, B.F., Maaradji, A., La Rosa, M., Dumas, M., Buijs, J.C., van der Aalst, W.M.: Dis-covering causal factors explaining business process performance variation. In: InternationalConference on Advanced Information Systems Engineering. pp. 177–192. Springer (2017)4. Narendra, T., Agarwal, P., Gupta, M., Dechu, S.: Counterfactual reasoning for process opti-mization using structural causal models. In: Proceedings of Business Process ManagementForum. vol. 360, pp. 91–106. Springer (2019). https: // doi.org / / , 87 (2011)9. Russell, C.: E ffi cient search for diverse coherent explanations. In: Proceedings of the Con-ference on Fairness, Accountability, and Transparency. pp. 20–28 (2019)10. Suriadi, S., Ouyang, C., van der Aalst, W.M., ter Hofstede, A.H.: Root cause analysis withenriched process logs. In: International Conference on Business Process Management. pp.174–186. Springer (2012)11. Wachter, S., Mittelstadt, B., Russell, C.: Counterfactual explanations without opening theblack box: Automated decisions and the gdpr. Harv. JL & Tech.31