[PDF] Case Level Counterfactual Reasoning in Process Mining

Abstract

Process mining is widely used to diagnose processes and uncover performance and compliance problems. It is also possible to see relations between different behavioral aspects, e.g., cases that deviate more at the beginning of the process tend to get delayed in the last part of the process. However, correlations do not necessarily reveal causalities. Moreover, standard process mining diagnostics do not indicate how to improve the process. This is the reason we advocate the use of \emph{structural equation models} and \emph{counterfactual reasoning}. We use results from causal inference and adapt these to be able to reason over event logs and process interventions. We have implemented the approach as a ProM plug-in and have evaluated it on several data sets. Our ProM plug-in produces recommendations that indicate how specific cases could have been handled differently to avoid a performance or compliance problem.

Full PDF

CCase Level Counterfactual Reasoning in Process Mining

Mahnaz Sadat Qafari and Wil van der Aalst

Rheinisch-Westf¨alische Technische Hochschule Aachen (RWTH), Aachen, Germany [email protected],[email protected]

Abstract.

Process mining is widely used to diagnose processes and uncover per-formance and compliance problems. It is also possible to see relations betweendi ﬀ erent behavioral aspects, e.g., cases that deviate more at the beginning of theprocess tend to get delayed in the last part of the process. However, correlationsdo not necessarily reveal causalities. Moreover, standard process mining diagnos-tics do not indicate how to improve the process. This is the reason we advocatethe use of structural equation models and counterfactual reasoning . We use re-sults from causal inference and adapt these to be able to reason over event logsand process interventions. We have implemented the approach as a ProM plug-inand have evaluated it on several data sets. Our ProM plug-in produces recom-mendations that indicate how speciﬁc cases could have been handled di ﬀ erentlyto avoid a performance or compliance problem. Keywords:

Process mining · Counterfactual statement · Classiﬁcation.

Statements like ”I should have done it di ﬀ erently.” or ”What if something had been dif-ferent?” show that, as humans, we tend to learn from past experiences and use them toadopt out behaviour.Such “what if” questions, in which the ”if” part is di ﬀ erent fromwhat happened in the reality, are known as a counterfactual questions . Humans tendto learn from the past (their experiences) by counterfactual thinking and reﬂecting ontheir ﬁndings aiming for better results in future similar cases (e.g., not doing the samemistakes). Using counterfactual thinking we want to explain why it is as it is by com-paring two outcomes, one as it happened in the reality and the other one that wouldhave happened under the exact same conditions, with some minor changes.The information systems of the companies save data about the process instances(cases) in their event logs. Process mining extracts knowledge from the event logs fordiscovering the process model, monitoring process KPIs, and improving processes. Pro-cess improvement requires a deep comprehension of the process behavior and its rea-sons, not just at the process level but also int the case level.In this paper, we tailor the concept of counterfactual thinking to process miningand explain why a speciﬁc situation has a special outcome. Given a instance with anundesirable outcome (which we call it current instance ), we aim at providing a set ofcounterfactual statements(which we call them explanations ) to explain why it happens.This is an intricate task as processes are complicated. A process may involve many stepsand in each step, many factors may be of inﬂuence. Also, the steps or the order of the a r X i v : . [ c s . A I] F e b M. S. Qafari et al. steps that are taken for each case may vary. These complications make counterfactualreasoning about the processes challenging.Providing explanations at the case level has several applications: – Customer satisfaction perspective: by explaining to the customer “why he / she re-ceive a speciﬁc result”, “was it justiﬁed or discriminatory” and “what he / she cando in the next similar situation to get a di ﬀ erent result”, companies can build trustwith their customers. It can be done without mentioning the details of the processwhich may include sensitive information of the company or may put other people’srights and privacy in danger [8]. – Process manager perspective: explaining why something happens in a speciﬁc caseand how to behave di ﬀ erently to prevent similar undesirable results in similar casesin the future.By applying the counterfactual thinking paradigm in the area of process mining,companies can boost their interpretability and accountability. This is possible if theexplanations are accurate. Having an accurate but not actionable explanation is goodbut not satisfactory. There are often several explanations for a situation which not all ofthem applicable. The applicability of an explanation can be amended by distinguishingthe correlation and causation among the process features, which prevents explanationsthat recommend altering features with non-causal relationships with the undesirableresult. Explanations that are based on correlation and not causation can be misleadingand acting upon them may cause more damage than good in future similar situations.To overcome this hurdle, we propose using a structural equation model of the featuresin the process to anticipate the outcome of the counterfactual situation. In such a causalequation model the value of each feature is determined by the value of other features(and some independent noise).Another complication regarding the applicable explanations is that the set of action-able features is case sensitive. For example, to prevent unwanted delays in the processof doing a task, we may allocate more resources to it or we may redeﬁne the task ina simpler manner. For two tasks A and B, even though both options are possible, theformer one is more desirable in task A and the latter one for task B. To overcome thisissue, in the proposed method, we present the user a set of diverse explanations andthe user can decide for himself / herself which one to apply. By diverse explanations, wemean a set of explanations that di ﬀ er from the current instance in di ﬀ erent features.As the explanations are meant to be used by the human, the readability and un-derstandability of the explanations are important. For that, each explanation needs tobe as simple as possible. Therefore the number of features with di ﬀ erent values in theexplanation and the current instance, should be small [11].The general overview of the proposed method is presented in Figure. 1. At the ﬁrststep, we enrich the event log and then several random counterfactual instances similarto the current instance are generated (which may or may not present in the event log).Among the generated counterfactual instances, those that have a desirable outcome re-garding a given threshold are selected and optimization techniques are used to makethem as close as possible to the current instance. The resulting desirable counterfactualinstances are ordered according to their distance into the current instance, and ﬁnally,converted onto a set of explanations and presented to the people involved. ase Level Counterfactual Reasoning in Process Mining 3 Fig. 1.

The general overview of the proposed method.

The rest of the paper is organized as follows. In Section 2 a brief overview of the re-lated work is presented. In Section 3, process mining and causality theory preliminariesare presented. In Section 4, the proposed method is presented. The experimental resultsare presented in Section 5 and we discuss them in Section 6. Finally, in Section 7 theconclusion and the future works are presented.

There are already several approaches in the domain of process mining that deal withroot cause analysis. For example in [10,2] root cause analysis is done using a classiﬁca-tion techniques. The drawback of these methods is that the classiﬁcation techniques arebased on correlation and not causal relationships. Also, there are several works consid-ering causal relationships among di ﬀ erent features in a process. For example in [4,7],the goal is to ﬁnd the structural equation model of the process features at the processlevel. Also in [3], cause-e ﬀ ect relations between a range of business process charac-teristics and process performance indicators are investigated using time series analysis.There is also existing work on ﬁnding the root of a problem at the case level. In [1], aframework for generating case-level treatment recommendations is proposed such thatthe treatment recommendations maximize the probability of a given outcome. In thismethod a subset of candidate treatments that are most correlated with the outcome isextracted by applying an association rule mining technique. Then the subgroups withcausal relation between treatment and outcome are identiﬁed using uplift tree. Finally,the subgroups are sorted by the ratio of the score associated to them by the uplift treesand their cost.It is worth noting that counterfactual reasoning for explainability has been studiedextensively in the ﬁeld of data mining and machine learning (e.g., [9,11]). First, we compactly describe a few basic notations for the process mining and causalinference theory.

M. S. Qafari et al.

Process mining techniques start from an event log extracted from an information sys-tem. An event log is a collection of traces where each trace itself is a collection ofevents. An event log may include three di ﬀ erent levels of attributes: log level attributes,trace level attributes, and event level attributes. We denote the universe of attributenames by U att , where actName , timestamp ∈ U att . actName indicates the activityname and timestamp indicates the timestamp of an event. Also, we denote the uni-verse of values by U val and deﬁne values ∈ U att (cid:55)→ P ( U val ) as a function thatreturns the set of all its possible values for a given attribute name. We deﬁne ⊥ asa member of U val such that ⊥ (cid:60) values ( at ) for all at ∈ U att . We use this sym-bol to indicate that the value of an attribute is unknown, undeﬁned, is missing. Let U map = { m ∈ U att (cid:54)(cid:55)→ U val |∀ at ∈ dom ( m ) : m ( at ) ∈ values ( at ) } be the universe of allmappings from a set of attribute names to attribute values of the correct type. We deﬁnean event as follows: Deﬁnition 1 (Event).

Each element e ∈ U map , where e ( actName ) (cid:44) ⊥ and e ( timestamp ) (cid:44) ⊥ , is an event and E is the universe of all possible events. E + is the set of all non-empty chronologically ordered sequences of events. If (cid:104) e , . . . , e n (cid:105) ∈ E + , then for all ≤ i < j ≤ n, e i ( timestamp ) ≤ e j ( timestamp ) . Here we assume that events are unique; i. e., an event appears just once in a trace orevent log. This can be ensured by adding an extra identity attribute.We can group the events based on their properties. For at ∈ U att , and V ⊆ values ( at ),we can group the events in E for which the value of attribute at is in V as follows: group ( at , V ) = { e ∈ E| e ( at ) ∈ V } . For example, a group of events can be the set of events which are done by speciﬁcresources, start in a speciﬁc period of time during the day, with speciﬁc activity names,or with a speciﬁc duration. We denote the universe of all event groups by G = P ( E ).Based on the deﬁnition of an event we deﬁne an event log as follows: Deﬁnition 2 (Event Log).

We deﬁne U L = E + (cid:54)(cid:55)→ U map as the universe of all eventlogs. We call each element ( σ, m ) ∈ L a trace where L ∈ U L is an event log. When we need to consider features not existing in the event log, we have to enrichthe event log. It is done by adding new features to traces and events. We can compute thevalue of these derived features from the event log or possibly other sources. We can de-ﬁne many di ﬀ erent derived features related to any of the process perspectives; the timeperspective, the data ﬂow-perspective, the control-ﬂow perspective, the conformanceperspective, or the resource / organization perspective of the process.One of the main assumptions that we make while extracting the data from an eventlog is that only the features that have been recorded before the occurrence of a speciﬁc In this paper, it is assumed that the reader is familiar with sets, multi-sets, and functions. P ( X )is the set of non-empty subsets of set X (cid:44) ∅ . Let X and Y be two sets. f : X (cid:54)(cid:55)→ Y is a partialfunction. The domain of f is a subset of or equal to X which is denoted by dom ( f ). We write f ( x ) = ⊥ if x (cid:60) dom ( f ).ase Level Counterfactual Reasoning in Process Mining 5 feature can have causal e ﬀ ect on it. So the relevant part of a trace to a given feature is apreﬁx of it which we call it a situation . Let prfx ( (cid:104) e , . . . , e n (cid:105) ) = {(cid:104) e , . . . , e i (cid:105)| ≤ i ≤ n } a function that return all the non empty preﬁxes of a given sequence, then the deﬁnitionof a situation is as follows: Deﬁnition 3 (Situation). U situation = E + × U map is the universe of all situations. Wecall each element ( σ, m ) ∈ U situation a situation . Considering G ∈ G , we deﬁne a G -based situation subset of L as S L , G = { ( (cid:104) e , . . . , e n (cid:105) , m (cid:48) ) |∃ ( σ, m ) ∈ L (cid:104) e , . . . , e n (cid:105) ∈ pr f x ( σ ) ∧ e n ∈ G } , and trace-based situation subset of L asS L , ⊥ = L. To distinguish trace and event level attributes, we use situation feature which is apair including a name and a function. The function is deﬁned over the situations andidentiﬁed by a group of events, G (possibly G = ⊥ ), and an attribute name, at . Moreformally: Deﬁnition 4 (Situation Feature).

Let U sfn = U att × ( G ∪ {⊥} ) be the universe of the situation feature names and U sfFunction = U situation (cid:54)(cid:55)→ U val be the universe of all situ-ation feature functions . We deﬁne U sf = U sfn × U sfFunction as the universe of the situ-ation features where given at ∈ U att , and G ∈ G ∪ {⊥} , such that the situation feature(( at , G ) , ( at , G ) ) ∈ U sf then if G = ⊥ we have ( at , G ) (( σ, m )) = m ( at ) and if G ∈ G wehave ( at , G ) (( σ, m )) = e ( at ) where e = arg max e (cid:48) ∈ G ∩{ e ” ∈ σ } e (cid:48) ( timestamp ) where ( σ, m ) ∈ U situation .Moreover, we deﬁne a situation feature extraction plan as SFN ⊆ U sfn where

SFN (cid:44) ∅ . Given a situation ( σ, m ), if G = ⊥ , ( at , G ) (( σ, m )) returns the value of at in trace level(i.e. m ( at )). However, if G (cid:44) ⊥ , then ( at , G ) (( σ, m )) returns the value of at in e ∈ σ withthe maximum timestamp for which e ∈ G (i.e., an event e ∈ σ that happens last amongthose events of σ that belong to G ). Given a situation feature extraction plan SFN , wecan map each situation to a data point. We do that as follows:

Deﬁnition 5 (Instance).

Given a situation s ∈ U situation and

SFN ⊆ U sfn where

SFN (cid:44) ∅ , then the instance inst SFN ( s ) is deﬁned as inst SFN ( s ) ∈ SFN → U val suchthat ∀ sfn ∈ SFN (cid:0) ( inst SFN ( s ))( sfn ) = sfn ( s ) (cid:1) . We denote the universe of all possibleinstances as: U instance = (cid:91) s ∈U situation (cid:91) SFN ⊆U sfn SFN (cid:44) ∅ { inst SFN ( s ) } . An instance is a set where each element of it is a pair of a situation feature name anda value. With a slight abuse of notation, given a situation feature name sfn = ( at , G ), wedeﬁne values ( sfn ) = values ( at ).We consider one of the situation feature names in a given SFN as the target situationfeature name, denoted as tf , and the rest of them as descriptive situation feature names.Given a situation feature extraction plan SFN , tf ∈ SFN , and an event log L , we candeﬁne a tabular data-set, which we call it situation feature table , as the bag of instancesderived from the situations in a proper situation subset of L , regarding SFN and tf .More formally; M. S. Qafari et al.

Deﬁnition 6 (Situation Feature Table).

Let L ∈ U L be an event log, SFN a situationfeature extraction plan, and tf = ( at , G ) ∈ SFN where G ∈ G ∪ {⊥} . We deﬁne a situation feature table T L , SFN , tf as follows:T L , SFN , tf = [ inst SFN ( s ) | s ∈ S L , G ] . Note that according to Deﬁnition 6, if tf = ( at , G ) where G ∈ G , then the situationfeature table T L , SFN , tf includes the instances derived from the G -based situation subset S L , G of L . However, if G = ⊥ , then it includes the situations derived from the trace-based situation subset S L , ⊥ . Having a situation feature table T L , SFN , tf , we can infer the structural equation model ofits situation feature names. It can be provided by a customer who possesses the processdomain knowledge or can be inferred using several methods that already exists in theliterature (e.g., [7,4]). Loosely speaking, a structural equation model is a data generatingmodel in the form of a set of equations. These equations are not normal equations but away to determine how to generate the observational and the interventional distributions.More formally ; Deﬁnition 7 (Structural Equation Model (SEM)).

Let T L , SFN , tf be a situation featuretable, where L ∈ U L , SFN ⊆ U sf , and tf ∈ SFN , the SEM of T L , SFN , tf is deﬁned as EQ ∈

SFN → Expr ( SFN ) where for each sfn ∈ S F, Expr ( SFN ) is an expression overthe situation feature names in SFN and possibly some noise N sfn . Moreover, the noisedistributions of N sfn for all s f n ∈ S F have to be mutually independent.

Here, we assume that

SFN is causal su ﬃ cient which means SFN includes all relevantsituation feature names and there is no common hidden confounder for the situationfeature names in

SFN . Also we assume that the SEM does not include any loop whichmeans, given a SEM EQ over the SFN of a situation feature table T L , SFN , tf , for each sfn ∈ SFN , the right side of expression sfn = Expr ( SFN ) in EQ does not include sfn .Given EQ , the situation feature names that appear in the right side of expression sfn = Expr ( SFN ) are called its parents and includes those situation feature nameswith a direct causal e ﬀ ect on it. The structure of the causal relationships between thesituation feature names in a SEM can be encoded as a directed acyclic graph, which iscalled causal structure . Given a SEM on a set of situation feature names SFN , eachvertex in its corresponding causal structure is analogous to one of the situation featurenames in

SFN and there is a directed edge from sfn to sfn if sfn appears in the rightside of expression EQ ( sfn ) where sfn , sfn ∈ SFN . As an example considered an IT company, which implements software for its cus-tomers. This company does not maintain the released software. Figure 2 shows the Deﬁnition 7 is based on [6].ase Level Counterfactual Reasoning in Process Mining 7

Petri-net model of this company. Each trace in the event log is corresponding to theprocess of implementation of one project. There is some cases for which the managerof the company believes their implementation phase should have taken less time forsome of the customers, so the target situation feature name is the duration of the im-plementation phase. The manager believes that the following attributes of a case might Fig. 2.

The process of the IT company in Section 5.1 have a causal e ﬀ ect on the duration of its implementation phase:1. Priority, which is an attribute of feasibility study activity and indicate how urgentthe software is for the customer.2. Complexity, which is an attribute of feasibility study activity and indicate the hard-ness of a project.3. The number of employees working on that project, which is mentioned as an at-tribute of team charter activity.4. The amount of time in person-days spent on the product backlog activity for thatproject.To avoid the notionally complicated formulas, we denote the relevant situation featurenames as mentioned at follows: – Complexity: (

Complexity , G ) where G = { e ∈ E| e ( actName ) = Feasibility S tudy } – Priority: (

Priority , G ). – ProductBacklogDuration: (

Duration , G ) where G = { e ∈ E| e ( actName ) = Product Backlog } – NumEmployees : (

Number o f employees , G ) where G = { e ∈ E| e ( actName ) = ” T eamCharter ” } – ImplementationDuration : (

S ub Model Duration , ⊥ ). ImplementationDuration is the target situation feature name and

SFN = { Complexity , Priority , NumEmployees , ProductBacklogDuration , ImplementationDuration } .In table 1 we have provided two SEM for the situation feature names in SFN . Theupper one is a linear and the lower one in a non-linear SEM. The causal structure of bothof these SEMs are as depicted in Figure 3.3. According to this causal structure, spending By implementation phase, we mean the sub-model including two transitions “development”and “test” (marked with a blue rectangle in Figure 2). M. S. Qafari et al. more or less time on the product backlog does not have any e ﬀ ect on the duration of theimplementation phase. The reason is that there is no causal relation between them, eventhough they are correlated. Linear structural equation model

Complexity = N Complexity N Complexity ∼ Uniform (1 , Priority = N sf BC , P N Priority ∼ Uniform (1 , ProductBacklogDuration = Complexity + N ProductBacklogDuration N ProductBacklogDuration ∼ Uniform ( − , NumEmployees = Complexity + Priority + N NumEmployees N NumEmployees ∼ Uniform ( − , ImplementationDuration = Complexity + NumEmployees + N ImplementationDuration N ImplementationDuration ∼ Uniform (10 , Complexity = N Complexity N Complexity ∼ Uniform (1 , Priority = N sf BC , P N Priority ∼ Uniform (1 , ProductBacklogDuration = Complexity + (cid:98) Complexity / (cid:99) + N ProductBacklogDuration N ProductBacklogDuration ∼ Uniform ( − , NumEmployees = Complexity (cid:112)

Complexity + Priority + N NumEmployees N NumEmployees ∼ Uniform ( − , ImplementationDuration = Complexity + Complexity · NumEmployees · (cid:112) NumEmployees − N ImplementationDuration ∼ Uniform (10 , NumEmployees , + · (cid:112) mod ( NumEmployees , + + N ImplementationDuration

Table 1.

Linear (top) and nonlinear (bottom) SEM for the IT company in Section 3.3.

Given a situation feature table T L , SFN , tf , we distinguish one of its instances with anundesirable value for the target situation feature name as the current instance , denotedby cinst . Moreover, using T L , SFN , tf , we can infer the SEM of the situation feature namesin SFN . We use this SEM to generate a set of counterfactual explanations for the cinst .As it is vain to investigate the e ﬀ ect of assigning di ﬀ erent values on those situationfeature names that their value can not be altered by the user, we study the e ﬀ ect ofchanging the value of those situation feature names that their values are modiﬁableby the user. We call the set of modiﬁable situation feature names actionable situationfeature names and denote it with ASF .A set of counterfactual explanations which describe the reasons of the undesirableresult in the current instance is a set of alternative instances. More formally,

Deﬁnition 8 (A Set of Counterfactual Explanation).

Let

SFN ⊆ U sfn be a situationfeature extraction plane, tf ∈ SFN be the target situation feature name,

ASF ⊆ SFN ase Level Counterfactual Reasoning in Process Mining 9 where

ASF (cid:44) ∅ and tf (cid:60) ASF be a set of actionable situation feature names. Consider EQ to be a SEM over SFN and cinst : SFN → U val be the current instance. A set ofcounterfactual explanation is deﬁned as Ex EQ , SFN , ASF , tf ( cinst ) ⊆ SFN → U val . To generate the set of counterfactual explanations for the current instance cinst ,we take the following three steps: generating candidates, predicting the value of targetsituation feature name, and selecting a subset of candidates.1.

Generating candidates.

In the ﬁrst step, we generate several candidates for thevalues that could had been assigned to the actionable situation feature names of thecurrent instance. Each candidate is a member of (cid:83) A ⊆ ASF ∧ A (cid:44) ∅ A → U val . There areseveral ways to generate candidates. We do it by generating several candidates foreach nonempty subset A ⊆ ASF such that for half of the generated candidates thevalue for each situation feature name sfn ∈ A is selected from the distribution of sfn in the situation feature table and for the other half, it is selected randomly from values ( sfn ).2. Predicting the value of the target situation feature name.

In this step we com-pute the e ﬀ ect of replacing the values of the situation feature names in the currentinstance with the values of the generated candidates in the previous step on thevalue of target situation feature name (and possibly other situation feature names).If we do not have access to the SEM of the situation features names in SFN , thenwe can use a machine learning technique, like neural network, and train it using T L , SFN , tf . Now for each candidate c generated in the previous step, we can generatean instance inst : SFN \ { tf } → U val in which inst ( sfn ) = c ( sfn ) if sfn ∈ dom ( c )and inst ( sfn ) = cinst ( sfn ) if sfn ∈ SFN \ ( dom ( c ) ∪ { tf } ). Then, we can use thetrained machine learning technique to predict the value of tf for that instance. But,if we have access to the SEM EQ , then predicting the value of tf for each candidateincludes three steps abduction , action , and prediction [5]. We explain these stepsusing an example. – Abduction. First we need to incorporates the observed data, cinst , into themodel, EQ , and generate a counterfactual SEM that explains the conditionsand the behavior of the system and the environment when cinst was happening.A counterfactual SEM , EQ (cid:48) , is obtained by replacing the distribution of noiseterms in EQ with the corresponding noise distributions condition on SFN = cinst . Example 1.

Suppose in the example mentioned in Section 3.3, we select thecurrent instance as { ( Complexity , , ( Priority , , ( NumEmployees , , ( ProductBacklogDuration , , ( ImplementationDuration , Complexity = Priority = ProductBacklogDuration = × Complexity + NumEmployees = × Complexity + × Priority + ImplementationDuration = × Complexity + × NumEmployees + – Action. The second step is taking action toward enforcing changes in the coun-terfactual SEM EQ (cid:48) , regarding candidate c . The result is a SEM EQ (cid:48)(cid:48) where sfn = c ( sfn ) if sfn ∈ dom ( c ) and sfn = EQ (cid:48) ( sfn ) where sfn (cid:60) dom ( c ). Example 2.

Suppose that we are interested in predicting the value of

Imple-mentationDuration for the candidate (

Priority , Complexity = Priority = ProductBacklogDuration = × Complexity + NumEmployees = × Complexity + × Priority + ImplementationDuration = × Complexity + × NumEmployees + – Prediction. The third step involves using the modiﬁed SEM to predict thecounterfactual value of the target situation feature name. Prediction is simplydone by computing the value of targer situation feature name (or the distribu-tion of it) in the counterfactual SEM under the intervention EQ (cid:48)(cid:48) . In this step,we remove those situation feature names from the domain of c that does notplay any role in determining the value of the target situation feature name re-garding EQ (cid:48)(cid:48) . Example 3.

After computing the values of the situation feature names, we willhave: { ( Complexity , , ( Priority , , ( NumEmployees , , ( ProductBacklogDuration , , ( ImplementationDuration , Selecting a subset of candidates.

We look for a set of close candidates as ex-planations which has four conditions. First, we need that the predicted value forthe target situation feature name in the generated instance be desirable regardinga threshold t provided by the user. We call these candidates desirable candidates .Second, the domain of each candidate has the fewest possible number of situationfeature names. Third, They are as close as possible to the current instance, accord-ing to a distance function d : U inst × U inst → R . Forth, they di ﬀ er from each otheras much as possible regarding the set of situation feature names in their domain. Forthe distance function d ( ., . ), we use L metric on the normalized situation featurenames. As mentioned in [11], using L metric, more sparse explanations wouldbe generated. For the diversity, we partition desirable candidates based on theirdomain and then sort the desirable candidates in each partition according to their L distance to the current instance. A set of desirable candidates are selected oneby one from di ﬀ erent partitions, with the priority of those partitions that have lessnumber of situation feature names in their domains which least number of desirablecandidates have been selected from that partition by now. ase Level Counterfactual Reasoning in Process Mining 11 We implemented the proposed method in ProM which is a free and expandable platformfor the process mining algorithms. The implemented plugin can be found under thename counterfactual explanation in ProM nightly build.In the implemented plugin we have implemented L norm on the normalized situ-ation features and L norm weighted by the inverse median absolute deviation [11] asthe distance functions to sort the desirable candidates. Also, we can apply several clas-siﬁer (including Regression Tree(RT), Locally Weighted Learning(LWL), Multi-layerperceptron(NN)), as well as structural equation model, to predict the value of targetsituation feature name for each candidate. For diversity and optimizing the distance ofdesirable candidates and the current instance, we have used hill climbing algorithm andimplemented a method which in a combination of both the methods mentioned in [11]and [9]. We optimize the current instance to the close desirable candidates by changingthe value of di ﬀ erent subsets of actionable situation feature names (similar to [9]) andalso we optimize the distance of several randomly generated desirable candidates to thecurrent instance (similar to [11]).To evaluate the explanation generated using this method, we apply it on two syn-thetic event logs with known linear and nonlinear SEMs and one real event log of BPIchallenge 2017. In the following, ﬁrst we describe the data generation method and theresults of applying this method on the synthetic event logs and then we discuss theresults for the real event log. The goal of applying the implemented plugin on the synthetic event logs is to see howdi ﬀ erent might be the predicted counterfactual value of a counterfactual instance by theSEM and by a machine learning technique. In this part of the experiment we have notused optimization on the selected desirable counterfactual instances. Synthetic event logs

For the synthetic event log, we have used the example mentionedin Section 3.3. We have generated two di ﬀ erent event logs where the duration of theimplementation phase of a case is linearly dependent to the value of other situationfeature names in the ﬁrst event log and non-linearly in the second event log. The SEMof the generated event logs is presented in 1. For each event logs we have generated1000 traces. Then, using the implemented plugin, we generate a set of 8 explanationsfor both linear and nonlinear scenarios. In this experiment, ImplementationDuration isthe target situation feature name and

SFN = { Complexity , Priority , NumEmployees , ProductBacklogDuration , ImplementationDuration } .We have used the structural equation model to evaluate possible samples and selectthe desirable candidates. We have used the classiﬁer with the highest accuracy for pre-dicting the value of ImplementationDuration on the selected desirable candidates in theprevious step and investigate how close are the predicted values by machine learningtechniques to the true values and how di ﬀ erent are the generated explanations in termof number of situation feature names with di ﬀ erent values in current instance and theexplanations. As the current instance and the desirable value threshold, we have used the follow-ing setting: – Linear case.

The current instance was an instance in which

Complexity = Priority = NumEmployees = ProductBacklogDuration = ImplementationDuration =

577 and the desirable threshold was 500. – Non-linear case.

The current instance was an instance in which

Complexity = Priority = NumEmployees = ProductBacklogDuration = ImplementationDuration = – Linear case. The accuracy of the machine learning techniques RT, LWL, and NNon the data were 0.818, 0.990, and 0.984 respectively. But their accuracy reducedon the counterfactual instances to 0.74, 0.77, and 0.76 respectively. – Non-linear case.

The accuracy of the machine learning techniques RT, LWL, andNN on the data were 0.986, 0.989, and 0.95 respectively. But their accuracy reducedon the counterfactual instances to 0.74, 0.25, and 0.79 respectively.The results of applying proposed method using structural equation model and threementioned machine learning techniques are presented in Figure 3. In Figure 3 a. andc., the predicted ImplementationDuration of applying the selected desirable candidatesusing SEM (red line), RT (blue line), LWL (green line), and NN (ligth green line) aredemonstrated (part a. for the linear case and part c. for the non-linear case). In Figure3 b. and d., the number of e ﬀ ective situation feature names in the domain of selecteddesirable candidates demonstrated (part b. for the linear case and part d. for the non-linear case). For the real event log we have used BPI challenge 2017 event log. This event log in-cludes 1,202,267 events pertaining to 31,509 bank loan applications. In this event logeach trace have several attributes indicating the loan goal , requested amount , and appli-cation type . The value of application type can be either ‘New credit’ or ‘limit raising’.Also, each trace includes an event where its activity name is O Create O ﬀ er . This eventincludes attributes that indicate the amount of the ﬁrs withdrawal , the number of terms ,the monthly cost of the o ﬀ ered loan, and also the credit score of the loan applicant, andif the loan was selected by the loan applicant.Another event log including the information of 42,995 o ﬀ ers created for these loanapplicants has been provided in BPI challenge 2017. We have enriched the BPI chal-lenge 2017 event log by adding an attribute, numO ﬀ ers , to each trace which indicatesthe number of o ﬀ ers provided for that applicant (computed from the second event log).The question is to see why some of the loans have been rejected by some of theloan applicants and was it possible to prevent rejecting the loan o ﬀ ers. Let G = { e ∈ ase Level Counterfactual Reasoning in Process Mining 13 Fig. 3.

The result of applying implemented method on the synthetic event logs. E| e ( actName ) = O Create O ﬀ er } . We have used ( selected , G ) as the class situationfeature where selected is encoded as 1 and rejected as 0. In this experiment we set SFN = { ( loanGoal , ⊥ ) , ( requestedAmount , ⊥ ) , ( numO ﬀ ers , ⊥ ) , ( applicationT ype , ⊥ ) , ( creditScore , G ) , ( ﬁrstWithdrawal , G ) , ( NumberOfTerms , G ) , ( monthlyCost , G ) , ( selected , G ) } . The set of actionable situation feature names ASF is { ( numO ﬀ ers , ⊥ ) , ( ﬁrstWithdrawal , G ) , ( NumberOfTerms , G ) , ( monthlyCost , G ) } . We have used a NN to predict the value for the target situation feature name. Here aretwo examples of the generated explanations:When the current instance is the instance corresponding to the loan applicant ‘Ap-plication 1959700739’, then we get the following explanations: – By setting (

NumberO f T erms , G ) to 30 instead of 48, the o ﬀ er would be selected. – By setting ( f irstWithdrawal , G ) to 4586.9 instead of 5250, the o ﬀ er would be se-lected.These o ﬀ ers mean that probably if in the created o ﬀ er by the bank the ﬁrst withdrawamount or the number of terms were less, then the applicant would select the o ﬀ er. Weturned the resulting counterfactual explanations into textual human readable explana-tions.As another example consider setting the current instance to the instance correspond-ing to the ‘Application 557515885’, then we get the following explanations: – By setting (

NumberO f T erms , G ) to 30 instead of 48, the o ﬀ er would be selected. – By setting ( f irstWithdrawal , G ) to 1000 instead of 8500, the o ﬀ er would be se-lected. – By setting ( monthlyCost , G ) 863.928 instead of 200.85 and setting ( numO f f ers , ⊥ )to 2 instead of 1, the o ﬀ er would be selected.This explanations mean that probably an o ﬀ er with less ﬁrst withdrawal amount or lessnumber of terms were more desirable to the loan this applicant. Regarding the results demonstrated in Figures 3, we can see that there is a gap betweenthe values predicted by the machine learning techniques and the structural equationmodel. Also the accuracy of the classiﬁers predicting the value of the counterfactualinstances drops dramatically. This phenomenon can be explained by the di ﬀ erence inthe mechanism of predicting counterfactual values using a structural equation modeland a machine learning technique.When evaluating an instance generated using a candidate, ﬁrst using the currentinstance and the structural equation model of the process, we drive a counterfactualstructural equation model that shows how the environment e ﬀ ects during the currentinstance (by replacing the noise terms with the noise terms driven from the currentinstance.). The next step is intervening on the counterfactual SEM as dictated by thecandidate. Finally, the value of the target situation feature name is predicted in themodiﬁed SEM. On the other hand, using a machine learning technique, neither thebehavior of the environment nor the e ﬀ ect of an intervention is considered. The instancegenerated by the candidate and the current instance is regarded an a new instance, whichmainly results in wrong predictions. If we remove the situation feature names withnon-causal relationship with the target situation feature name from the situation featuretable, the accuracy of the classiﬁers on the counterfactual instances remain more or lessthe same. This indicates that the di ﬀ erence in the predicted value for the class situationfeature comes from the di ﬀ erence in the mechanism of the prediction.The di ﬀ erence in the number of e ﬀ ective situation feature names with di ﬀ erentvalues between the current and candidate comes from the fact that machine learningtechniques do not distinguish among the situation feature names with causal and merecorrelation relationship with the target situation feature name. On the other hand, us-ing SEM the changes on the values of the situation feature names that have no causalrelationships with the target situation feature name in the counterfactual instances aresimply ignored. We have presented a method to generate case speciﬁed explanations in the context ofthe process mining. In this method, given an instance with an undesirable result for thetarget situation feature name and a threshold for the desirable result, a large numberof random candidates are generated. In the next step, the value of the target situation ase Level Counterfactual Reasoning in Process Mining 15 feature name for these instances generated by applying the changes suggested by thecandidate on the current instance is predicted. Optimization method is used to createclose desirable candidates using the current instance and also some of the generateddesirable candidates generated in the previous step. ﬁnally, a set of close and at thesame time diverse desirable candidates are selected as the set of explanations.Here we have assumed that the applicability of a situation feature is case depen-dent and is known just by the user, who the explanations are for. So, a set of diverseexplanations can help the user to decide which one works the best for him.The results of the evaluations on the synthetic event logs have shown that ignoringthe causal relationships among the situation features end up in explanations that sug-gest changing situation features with no causal e ﬀ ect on the value of the class situationfeature. Moreover, using a machine learning technique, regardless of its accuracy, forpredicting the value of target situation feature may result in imprecise or wrong expla-nations or missing some of the good explanations.As the future work, we are going to focus on modifying this method to be used inthe online setting. References

1. Bozorgi, Z.D., Teinemaa, I., Dumas, M., Rosa, M.L., Polyvyanyy, A.: Process mining meetscausal machine learning:discovering causal rules from event logs. In: ICPM (2020)2. Ferreira, D.R., Vasilyev, E.: Using logical decision trees to discover the cause of processdelays from event logs. Computers in Industry , 194–207 (2015)3. Hompes, B.F., Maaradji, A., La Rosa, M., Dumas, M., Buijs, J.C., van der Aalst, W.M.: Dis-covering causal factors explaining business process performance variation. In: InternationalConference on Advanced Information Systems Engineering. pp. 177–192. Springer (2017)4. Narendra, T., Agarwal, P., Gupta, M., Dechu, S.: Counterfactual reasoning for process opti-mization using structural causal models. In: Proceedings of Business Process ManagementForum. vol. 360, pp. 91–106. Springer (2019). https: // doi.org / / , 87 (2011)9. Russell, C.: E ﬃ cient search for diverse coherent explanations. In: Proceedings of the Con-ference on Fairness, Accountability, and Transparency. pp. 20–28 (2019)10. Suriadi, S., Ouyang, C., van der Aalst, W.M., ter Hofstede, A.H.: Root cause analysis withenriched process logs. In: International Conference on Business Process Management. pp.174–186. Springer (2012)11. Wachter, S., Mittelstadt, B., Russell, C.: Counterfactual explanations without opening theblack box: Automated decisions and the gdpr. Harv. JL & Tech.31