Discovering Business Area Effects to Process Mining Analysis Using Clustering and Influence Analysis
DDiscovering Business Area Effects To ProcessMining Analysis Using Clustering and InfluenceAnalysis
Teemu Lehto , − − − and Markku Hinkka , − − − QPR Software Plc, Finland Aalto University, School of Science, Department of Computer Science, Finland
Abstract.
A common challenge for improving business processes inlarge organizations is that business people in charge of the operationsare lacking a fact-based understanding of the execution details, processvariants, and exceptions taking place in business operations. While ex-isting process mining methodologies can discover these details based onevent logs, it is challenging to communicate the process mining findingsto business people. In this paper, we present a novel methodology fordiscovering business areas that have a significant effect on the process ex-ecution details. Our method uses clustering to group similar cases basedon process flow characteristics and then influence analysis for detectingthose business areas that correlate most with the discovered clusters. Ouranalysis serves as a bridge between BPM people and business, people fa-cilitating the knowledge sharing between these groups. We also presentan example analysis based on publicly available real-life purchase orderprocess data.
Keywords: process mining, clustering, influence analysis, contribution,business area, classification rule mining, data mining
Process mining helps organizations to improve their operations by providingvaluable information about the business processes in easy to understand visualflowchart format based on transactional data in ERP systems. However, to pro-vide these meaningful results, the data extracted from ERP systems may oftencontain different kinds of objects like ’apples and oranges’ that should be an-alyzed separately. Using the procurement process as an example: the purchaseorder database tables may contain several different kinds of purchase ordersitems like services, equipment, raw materials, software licenses, high-cost items,free items, headquarter purchases, plant maintenance costs, manually approveditems, and automatic replenishment purchases. Without appropriate tools theprocess analyst needs to either a) analyze all items separately - leading to poten-tially massive amount of work, b) analyze all items at the same time - leading topotentially meaningless results or c) rely on subjective information like askingbusiness people which of the items should be analyzed separately or relying on a r X i v : . [ c s . D B ] M a r Teemu Lehto, Markku Hinkka own intuition. The techniques presenting in this paper help the analyst to dis-cover those business areas (classification rules) that seem to have a major effecton the business process flow. These business areas are based on case attributecharacteristics of the cases and thus easy to understand for the business people.Discovered business areas can be used to effectively guide the process mininganalysis further in divide & conquer manner.In this paper, we present methods to answer these three questions: – How a business process can be analyzed based on the process flow of individ-ual process instances in order to discover business-relevant clusters in such away that a business analyst can easily understand the clustering results anduse them for further analysis. – How to find business areas that have a major effect on process flow behavior. – How to further consolidate business area results to discover case attributesthat have a significant effect on process flow behavior.The rest of this paper is structured as follows: Section 2 is a summary of thelatest developments. Section 3 present our methodology for Discovering BusinessArea Effects. Section 4 is a case study with real-life purchase order process data.Section 5 shows limitations and Section 6 draws the final conclusions.
Process mining is an active research area that analyses business processes basedon the event log data from IT systems in order to discover, monitor, and im-prove processes [16]. Process mining typically focuses on discovering the processflowchart as a control flow diagram, Petri net, or BPMN diagram. Other processmining types include conformance checking and enhancement. Root cause anal-ysis as part of process mining has been studied in [14] as well as in our previousworks [8] and [9].One key challenge in process mining is that a single event log may often con-tain many different processes, in which case trying to discover a single processdiagram for the whole log file is not a working solution. In the process miningcontext clustering has been studied a lot for with excellent results [5], [12], [3]and [15]. These previous work cover the usage of several distance measures likeEuclid, Hamming, Jaccard, Cosine, Markov chain, Edit-Distance as well as sev-eral cluster approaches like partitioning, hierarchial, density-based and neuronalnetwork. However, most of the previous research related to clustering within theprocess mining field has been directly focused on the process flowchart discoverywith the prime objectives categorized as Process, Variant or Outlier Identifi-cation, Understandability of Complexity, Decomposition, or Hierarhization. Inpractice, this means that clustering has been used as a tool for improving theother process mining methods like control flow discovery to work better, i.e.,clustering has divided the event log into smaller sub logs that have been directlyused for further analysis. In this paper, we show how to use clustering for dis-covering those business areas that have a significant effect on process behavior. iscovering Business Area Effects To Process Mining Analysis... 3
Yet another use case for clustering in the process mining field has been to per-form structural feature selection in order to improve the prediction accuracy andperformance [6].Some recent research has started to address the challenge of how to explainthe clustering results to business analyst [2]. It has been presented that whenexplaining the characteristics of clusters to business analysts, the role of caseattributes becomes more important [11]. We show an easy-to-understand repre-sentation for showing cluster characteristics based on the difference of densitiesand case attribute information.Substantial effort has also been spent in the process mining community todiscover branching conditions from business process execution logs [4]. This hasalso lead to the introduction of decision models and decision mining [1] as wellas a standard Decision Model and Notation (DMN) [10]. While the objective ofthe decision modeling is to provide additional details into individual branchingconditions, our approach is to analyze the effect of any business area to the wholestructure of the process flow, not just one decision branch at a time.
In this section, we present our methodology for Discovering Business Area Ef-fects To Process Mining Analysis Using Clustering and Influence Analysis. Ourapproach is to do the clustering using process flow features and then use influenceanalysis to find those business areas that have the highest contribution for cer-tain kinds of cases ending up in distinct clusters. If all process instance-specificbusiness area values derived using any given case attribute are distributed ran-domly, then the contribution measure for each business area is zero, and theinformation for the analyst is that the particular case attribute does not corre-late with the way how the clusters are formed. According to our methodology,it then means that the particular case attribute has no influence on the pro-cess flow behavior. In summary, our method finds those business areas and caseattributes that have the highest contribution to the process flow behavior.
To identify those business areas that have the strongest ef-fect on the process execution, we first run clustering using relevant features rep-resenting the process execution characteristics. These features have been widelystudied in Trace Clustering papers [12],[15], [6] and [7]. Clustering is a trade-offbetween quality and performance. As the amount of features is increased, thequality of the results potentially improves while performance gets slower. – Activity profile: This profile contains one feature for each Event Type labelin the data. The value of this feature is related to the number of occurrencesof that particular event type within the case. If the number of occurrencesis used as an exact value, then the clustering algorithm somehow needs to
Teemu Lehto, Markku Hinkka take into account the continuous values, ie. repeating activity A seven timesis much more similar to repeating 6 or 8 times, compared to repeating theactivity A only twice. One approach is to use value zero if the Event Logcontains no occurrences of the Event Type for the given case and one if thelog contains one or more occurrences. While this approach often works well,it may not be able to detect the repeating of a given Event Type multipletimes with the log. For this reason, we recommend using value zero for nooccurrences of the Event Type, one for only one occurrence and two for two or more occurrences. – Transition profile The transition profile captures all process flows from everyactivity to the next activity. In effect, it contains the process control flowinformation. Transition profile potentially provides a large number of fea-tures up to the square of the number of Event Types plus one for start andend transitions. For example, in the sample analysis presented in Section 4,we have 42 distinct event types, giving potentially 43 = 1849 distinct tran-sition. Luckily the control flow for 251.734 cases only contains 676 distincttransitions. Because the amount of transition features is high, we recommendusing the coding zero if the transition does not occur in the case and one ifit occurs once or more. Clustering Algorithms
A comparative analysis of process instance clustertechniques has been presented in [15] and shows how various clustering tech-niques have been used to separate different process variants from a large set ofcases as well as reducing the complexity by grouping similar cases into same clus-ters. Considering our method, the main functional requirement for the clusteringalgorithm is that it needs to put cases with similar process flow behavior intothe same clusters, and all 20 approaches listed in [15] meet this requirement. If aparticular clustering algorithm produces meaningful results and if there indeedis a correlation with a particular business area, then our method gives very highcontribution values for that business area. If the clustering algorithm does notwork perfectly but is still capable to some extent grouping similar cases together,then the contribution values are still likely to show the most significant businessareas among the top contributors. The essential non-functional requirement forthe clustering algorithm is performance, i.e., the ability to produce results fastwith a small amount of memory. With these considerations, we have receivedgood results with the algorithms and parameters below: – One-hot encoding.
Since our Activity and Transition feature profiles onlyinclude categorial values zero, one, and two, it is possible to use efficientone-hot encoding. This results in maximum of ( n ( EventT ypes ) + 1) + 2 ∗ n ( EventT ypes ) feature vectors. – Hamming distance is the natural choice as the distance function with binarydata like one-hot encoded features, because it completely avoids the floating-point distance calculations needed for common Euclid distance measure. – K-modes clustering algorithm is suitable for categorical data. In our tests,k-modes produced well-balanced clusters and was fast to execute. The re- iscovering Business Area Effects To Process Mining Analysis... 5 sult of K-modes depends on the initial cluster center initialization. We alsotested agglomerative clustering algorithms, but it produced highly unbal-anced clusters. – Number of clusters has a significant effect on the clustering. To discoverthe business areas, clustering should be done several times with differentnumbers of clusters. We found out that clustering four times with clustersizes 2, 3, 5, and 10 clusters gave enough variation in the results providingmeaningful results. When the number of clusters is less than five, the largebusiness areas correlate more with the clustering. While clustering to 10 ormore clusters, the smaller business areas like
Vendor , Customer , Product having more distinct values correlate more with the clusters. Running theclustering several times is also an easy way to mitigate the random behaviorof K-modes coming from initialization.
Examples of business area dimensions include: company code , product line , sales unit , delivery team , geographical location , customer group , product group , branch offices , request category and diagnosis code . All the caseattributes that are relevant to business can be used as business area dimensionsas such, for example, product code . However, a large organization may easily havethousands of low-level product codes in their ERP system, so it is beneficial tohave access to product hierarchy and use each level as a separate business areadimension. Another example of a derived business area dimension is when acase attribute like Logistics Manager Name can be used to identify the
DeliveryTeam . We again suggest having both the
Logistics Manager and
Delivery Team as business area dimensions; if one particular
Logistics Manager has many casesand a major effect on process flow behavior then our method will show thatperson as the most significant business area in the
Logistics Manager dimension.The third example of derived business areas is to utilize the event attributes.For example the
Logistics Manager Name may be stored as an attribute valuefor the
Delivery Planning Done activity. If there is always at most one
DeliveryPlanning Done activity, then the attribute value can be used as such in the caselevel. If there are multiple
Delivery Planning Done activities, then typical optionsinclude: use the first occurrence, use the last occurrence or use a concatenatedcomma-separated list of all distinct values from activities as the value on thecase level. The outcome of forming business area dimensions is a list of case-level attributes that contain a specific (possibly empty) business area value foreach case. To continue with our formal methodology, we now consider thesebusiness area dimensions as case attributes and the case attribute values as thecorresponding business areas.
Interestingness Measures
We now present the definitions for interestingnessmeasures used for finding the business areas that correlate with the clusteringresults. Let C = { c , . . . , c N } be the set of cases in the process analysis. Each case Teemu Lehto, Markku Hinkka represents a single business process execution instance. Let P = { p , . . . , p N } bea set of clusters each formed by clustering the cases in C . C p = { c p , . . . , c p N } is the set of cases belonging to cluster p. C p ⊆ C . Similarly C a = { c a , . . . , c a N } is the set of cases belonging to the same business area a, ie. they have the samevalue for the case attribute a. Definition 1.
Let Density ρ ( a, C ) = n ( C a ) n ( C ) where n ( C a ) is the total amount ofcases belonging to the business area a and n ( C ) is the total amount of all casesin the whole process analysis. Similarly, the Density ρ ( a, C p ) = n ( C p ∩ C a ) n ( C p ) is thedensity of cases belonging to the business area a within the cluster P. Definition 2.
Let
Contribution %( a → p ) = ρ ( a, C p ) − ρ ( a, C ) = n ( C p ∩ C a ) n ( C p ) − n ( C a ) n ( C ) is the extra density of cases belonging to the business area a inthe cluster p compared to average density. If business area a is equally distributed to all clusters, then the
Contribution %( a → p ) is close to zero in each cluster. If the business area a isa typical property in a particular cluster p i and rare property in other clusters,then the Contribution %( a → p i ) is positive and other Contribution %( a → p j , wherej <> i ) values are negative. Calculating the sum of all Contributionvalues for all clusters is always zero, so the extra density in some clusters isalways balanced by the smaller than average density in other clusters.We now want to find the business areas that have a high contribution inmany clustering. We define: Definition 3.
Let
BusinessAreaContribution ( a ) = (cid:80) p i ∈ P n ( C pi ) n ( C ) ( max { Contribution %( a → p i ) , } ) .Here we sum the weighted squares of all positive contributions the business areaa has with any clustering p i . Positive values of Contribution %( a → p i ) indicatea positive correlation with the business are a and the particular cluster i, whilenegative values indicate that the business area a has smaller than the averagedensity in the cluster i. We found out that using only the positive correlationsgives more meaningful results when consolidating to the business area level. Sincea few high contributions are relatively more important than many small contri-butions, we use the Variance of the density differences, i.e., taking the square ofthe Contribution %( a → p i ) . Since a contribution within a small cluster is lessimportant than contribution in a large cluster, we also use the cluster size basedweight n ( C p ) n ( C ) . Any particular business area a may have a substantial contribution in someclusters and small contribution in other, so the sum of all these clusterings isgiving the overall correlation between business area a and all clusters p i ⊆ P We use the term
Business area in this paper for any combination of a processmining case attribute and a distinct value for that particular case attribute.
BusinessAreaContribution thus identifies the individual case attribute-valuecombinations that have the highest effect on clustering results. It is then alsopossible to continue and consolidate the results further to Case Attribute level: iscovering Business Area Effects To Process Mining Analysis... 7
Definition 4.
Let AT = { at , . . . , at N } be a set of case attributes in the processanalysis. Each case c i ∈ C has a value at jc i for each case attribute at j ∈ AT . at jc i is the value of case attribute at j for case c i and V at j = { v at j , . . . , v at jN } isthe set of distinct values that the case attribute at j has in the process analysis. Definition 5.
Let
CaseAttributeContribution ( at ) be a sum of all BusinessArea-Contributions from all the business areas corresponding to the given case at-tribute at as (cid:80) v atji ∈ V atj BusinessAreaContribution ( at j vatji ) In this section, we apply our method to the real-life purchase order processdata from a large Netherlands multinational company operating in the area ofcoatings and paints. The data is publicly available as the BPI Challenge 2019[17] dataset. We made the following choices: – Source data
We imported the data from the XES file as such withoutany modifications. To keep the execution times short, we experimented withthe effect of running the analysis with a sample of the full dataset. Ourexperiments showed that the results remained consistent for sample size10.000 cases and more. With the sample size of 1.000 cases, the results ofthe individual analysis runs started to change, so we decided to keep thesample size 10.000 cases. – Clustering algorithm
We used the k-modes clustering as implemented inAccord.Net Machine Learning Framework [13] with one-hot encoding andhamming distance function. To take into account the different clusteringsizes, we performed clustering four times, fixed to two, three, five, and tenclusters. – Activity profile features for clustering
We used our default booleanactivity profile, which creates one feature dimension for each activity andthe value is zero if the activity does not occur in the case, value one if theactivity occurs once and value two if it is repeated multiple times. Therewere 37 different activities in the sample, and the Top 20 activity profile isshown in Table 1. – Transition profile features for clustering
Using a typical process mininganalysis to discover the process flow diagram, we discovered 376 differentdirect transitions, including 13 starting activities, 22 ending activities, and341 direct transitions between two unique activities. All of these 376 featureswere used as dimensions for clustering in a similar way as the activity profile,i.e., boolean value zero if transition did not occur in the case and one if itoccurred once or multiple times. – Business area dimensions
Since we did not have any additional informa-tion or hierarchy tables concerning possible business areas, we are using allavailable 15 distinct case attributes listed in Table 4 as business area dimen-sions. These case attributes have a total of 9901 distinct values, giving us
Teemu Lehto, Markku Hinkka
Table 1.
Activity profile: Top 20 activities ordered by unique occurrence count
Name Unique Count Count
Create Purchase Order Item 10 000 10 000Record Goods Receipt 9 333 13 264Record Invoice Receipt 8 370 9 214Vendor creates invoice 8 310 8 901Clear invoice 7 245 7 704Remove Payment Block 2 223 2 272Create Purchase Requisition Item 1 901 1 901Receive Order Confirmation 1 321 1 321Change Quantity 707 853Change Price 443 498Delete Purchase Order Item 338 339Cancel Invoice Receipt 251 271Vendor creates debit memo 244 253Record Service Entry Sheet 232 10 326Change Approval for Purchase Order 194 319Change Delivery Indicator 112 128Cancel Goods Receipt 109 136SRM: In Transfer to Execution Syst. 42 57SRM: Awaiting Approval 42 50SRM: Complete 42 50
Table 2 shows the results of clustering to fixed five clusters. We see that thefirst cluster contains 48% of cases, the second cluster 33%, third 17%, and both4th and 5th one percent each. Here we show the five most important businessareas based on the contribution%, which is calculated as the difference betweenCluster specific density of that business area and Total Density. These resultsalready give hints about the meaningful characteristics in the whole dataset, ie:Cluster one contains many
Standard cases from spend areas related to
Sales , Products for Resale and
NPR . On the other hand cluster two contains morethan average amount of cases from spend area Packaging , related to
Labels and PR . VendorID 0120 seems to be highly associated with the process flow char-acteristics of cluster 2. Cluster 3 is dominated by
Consignment cases. Cluster 4contains many
Metal Containers & Lids cases as well as cases from
VendorID s and . Further analysis of the top five business areas listed as charac-teristics for each cluster confirms that these business areas indeed give a goodoverall idea of the cases allocated into each cluster. We clustered four times for fixed cluster amounts of 2,3,5 and 10 - yielding atotal of 20 clusters, and then consolidating the results into business area levelusing Definition 3. The top 20 of all these 9901 business areas ordered by theirrespective Business Area Contribution is shown in Table 3. Clearly the business iscovering Business Area Effects To Process Mining Analysis... 9
Table 2.
Clustering results based on Contribution
Cluster Business Area a ClusterDensity TotalDensity Contribution
Spend area text = Sales 0.36 0.26 0.11Cluster1 Sub spend area text = Products for Resale 0.34 0.24 0.1148% cases Spend classification text = NPR 0.41 0.32 0.10Item Type = Standard 0.96 0.87 0.09Item Category = 3-way match, invoice before GR 0.95 0.88 0.07Spend area text = Packaging 0.65 0.44 0.21Cluster2 Sub spend area text = Labels 0.39 0.24 0.1633% cases Spend classification text = PR 0.79 0.66 0.13Name = vendor 0119 0.14 0.05 0.08Vendor = vendorID 0120 0.14 0.05 0.08Item Category = Consignment 0.33 0.06 0.27Cluster3 Item Type = Consignment 0.33 0.06 0.2717% cases Name = vendor 0185 0.09 0.02 0.08Vendor = vendorID 0188 0.09 0.02 0.08Item = 10 0.33 0.26 0.07Sub spend area text = Metal Containers & Lids 0.19 0.08 0.11Cluster4 Name = vendor 0393 0.09 0.01 0.081% cases Vendor = vendorID 0404 0.09 0.01 0.08Name = vendor 0104 0.11 0.04 0.07Vendor = vendorID 0104 0.11 0.04 0.07Spend classification text = NPR 0.59 0.32 0.27Cluster5 Spend area text = Sales 0.41 0.26 0.151% cases GR-Based Inv. Verif. = TRUE 0.21 0.06 0.15Item Category = 3-way match, invoice after GR 0.21 0.06 0.15Sub spend area text = Products for Resale 0.38 0.24 0.14 areas
Item Category = Consignment and
Item Type = Consignment have mostsignificant effect on the process flow. Looking at the actual process model, wesee that
Consignment cases completely avoid three of the five most commonactivities in the process, namely
Record Invoice Receipt , Vendor creates invoice and
Clear Invoice . Similarly, the business area
Spend area text = Packaging alsohas a high correlation with process flow characteristics. Analysis of the processmodel shows that, for example, 23% of
Packaging cases contain activity
ReceiveOrder Confirmation compared to only 5% of the other cases. Further analysis ofall the business areas listed in Table 3 shows that each of these areas has somedistinctive process flow behavior that is more common in that area compared tothe other business areas.
Finally, Table 4 consolidates individual business areas into the Case Attributelevel.
Item Type with six distinct values and
Item Category with four distinctvalues have most significant effects on process flow characteristics. To confirmthe validity of these results we further analysis the materials provided in BPIChallenge 2019 website including the background information and submissionreports [17]. It is clear that the
Item Type and
Item Category indeed can beregarded as the most important factors explaining the process flow behavior asthey are specifically mentioned to roughly divide the cases into four types of flowsin the data . It is also interesting to see that the
Spend are text and
Sub spend
Table 3.
Top 20 Business areas with major effect to process flow
Business Area a Contribution nCases n ( C a )Item Category = Consignment 0.051 576Item Type = Consignment 0.051 576Spend area text = Packaging 0.040 4382Spend classification text = NPR 0.024 3175Sub spend area text = Labels 0.022 2351Spend area text = Sales 0.021 2574Item Type = Standard 0.021 8740Sub spend area text = Products for Resale 0.021 2390Spend classification text = PR 0.019 6574Item Category = 3-way match, invoice before GR 0.017 8760Spend area text = Logistics 0.013 210Item Type = Service 0.013 244Item = 1 0.012 342GR-Based Inv. Verif. = TRUE 0.012 623Item Category = 3-way match, invoice after GR 0.012 625Name = vendor 0119 0.007 549Vendor = vendorID 0120 0.007 549Sub spend area text = Road Packed 0.006 145Name = vendor 0185 0.004 163Vendor = vendorID 0188 0.004 163 are text have a significant effect on the process flow even though they have muchhigher number of distinct values (19 and 115) compared to Spend classificationtext which only has four distinct values.
Table 4.
Case Attributes ordered by effect on process flow
Case Attribute at Contribution Distinct Values n ( V at )Item Type 0.086 6Item Category 0.080 4Spend area text 0.077 19Sub spend area text 0.056 115Spend classification text 0.043 4Name 0.025 798Vendor 0.025 840Item 0.016 167GR-Based Inv. Verif. 0.012 2Purchasing Document 0.002 7937Document Type 0.000 3Goods Receipt 0.000 2Company 0.000 2Source 0.000 1Purch. Doc. Category name 0.000 1 Forming business area dimensions is an essential step in our method. However,some relevant business areas may consist of several dimensions, for example, theprocess flow behavior could be very distinctive in a particular combination ofbusiness areas
SalesOffice=Spain and
ProductGroup=Computers . Automatically iscovering Business Area Effects To Process Mining Analysis... 11 detecting this kind of significant combined business areas would be a useful fea-ture. Another limitation is that the process flow behavior does not take intoaccount the performance profile, i.e., the lead times between individual activi-ties and the total case duration. Although the usage of this kind of numericalinformation would require a more advanced clustering technique, the influenceanalysis part of the method presented in this paper would already handle thediscovery of related business areas.
In this paper, we have presented a method for discovering those business areasthat have a significant effect on process flow behavior based on clustering andinfluence analysis. As a summary of our findings: – Our presented method is capable of discovering those business areas thathave the most significant effect on the process execution. Our method pro-vides valuable information to business people who are very familiar with caseattributes and attribute values but not so familiar with the often technicalevent type names extracted from transactional system log files. – Our method supports any available trace clustering method. Our case studyshows that using the k-modes clustering algorithm with activity and transi-tion profiles provides good results. – Clustering makes the analysts realize that not all the cases in the processmodel are similar. Using the
Contribution% measure to explain clusteringresults works well for explaining the clustering results to business people. – The case study presented in this paper confirms that the identified businessareas indeed have distinctive process flow behavior, for example missing ac-tivities, higher than average amount of some special activities, or distinctiveexecution sequence for activities. Using our method, the business analystmay now divide the process model into smaller subsets and analyze themseparately. It is a good idea to start the analysis of any process subset againby running the clustering to see if the cases are similar enough from bothprocess flow point of view. – Clustering reduces the need for external subject matter business experts.Naturally, it would be nice to have a person who can explain everything,but in real life, those persons are very busy, and some important details arealways likely to be forgotten by busy business people.
Acknowledgements.
We thank QPR Software Plc for the practical experiencesfrom a wide variety of customer cases and for funding our research. The algo-rithms presented in this paper have been implemented in a commercial processmining tool QPR ProcessAnalyzer.
References
1. Bazhenova, E., & Weske, M. (2016, September). Deriving decision models fromprocess models by enhanced decision mining. In International conference on businessprocess management (pp. 444-457). Springer, Cham.2. De Koninck, P., De Weerdt, J., & Vanden Broucke, S. K. (2017). Explaining clus-terings of process instances. Data mining and knowledge discovery, 31(3), 774-808.3. De Leoni, M., Van Der Aalst, W. M., & Dees, M. (2016). A general process miningframework for correlating, predicting and clustering dynamic behavior based onevent logs. Information Systems, 56, 235-257.4. De Leoni, M., Dumas, M., & Garca-Bauelos, L. (2013, March). Discovering branch-ing conditions from business process execution logs. In International Conference onFundamental Approaches to Software Engineering (pp. 114-129). Springer, Berlin,Heidelberg.5. De Medeiros, A. K. A., Guzzo, A., Greco, G., Van Der Aalst, W. M., Weijters, A. J.M. M., Van Dongen, B. F., & Sacc, D. (2007, September). Process mining based onclustering: A quest for precision. In International Conference on Business ProcessManagement (pp. 17-29). Springer, Berlin, Heidelberg.6. Hinkka, M., Lehto, T., Heljanko, K., & Jung, A. (2017, September). Structuralfeature selection for event logs. In International Conference on Business ProcessManagement (pp. 20-35). Springer, Cham.7. Hinkka, M., Lehto, T., Heljanko, K., & Jung, A. (2018, September). Classifyingprocess instances using recurrent neural networks. In International Conference onBusiness Process Management (pp. 313-324). Springer, Cham.8. Lehto, T., Hinkka, M., & Hollm´en, J. (2016, September). Focusing business improve-ments using process mining based influence analysis. In International Conference onBusiness Process Management (pp. 177-192). Springer, Cham.9. Lehto, T., Hinkka, M., & Hollm´en, J. (2017). Focusing business process lead timeimprovements using influence analysis. In International Symposium on Data-DrivenProcess Discovery and Analysis (SIMPDA) (pp. 54-67). Rheinisch-WestfaelischeTechnische Hochschule Aachen.10. OMG: Decision Model and Notation (DMN) v.1.2, 2019.11. Seeliger, A., Nolle, T., & Mhlhuser, M. (2018, September). Finding structure inthe unstructured: hybrid feature set clustering for process discovery. In InternationalConference on Business Process Management (pp. 288-304). Springer, Cham.12. Song, M., Gnther, C. W., & Van Der Aalst, W. M. (2008, September). Traceclustering in process mining. In International Conference on Business Process Man-agement (pp. 109-120). Springer, Berlin, Heidelberg.13. Souza, C. R. (2014). The accord .NET framework. So Carlos, Brazil. http://accord-framework.nethttp://accord-framework.net