Discovering Generative Models from Event Logs: Data-driven Simulation vs Deep Learning
DDiscovering Generative Models from Event Logs:Data-driven Simulation vs Deep Learning
Manuel Camargo , , Marlon Dumas , and Oscar González-Rojas University of Tartu, Tartu, Estonia, {manuel.camargo, marlon.dumas}@ut.ee Universidad de los Andes, Bogotá, Colombia, [email protected]
Abstract.
A generative model is a statistical model that is able to gen-erate new data instances from previously observed ones. In the contextof business processes, a generative model creates new execution tracesfrom a set of historical traces, also known as an event log. Two familiesof generative process simulation models have been developed in previ-ous work: data-driven simulation models and deep learning models. Untilnow, these two approaches have evolved independently and their relativeperformance has not been studied. This paper fills this gap by empiri-cally comparing a data-driven simulation technique with multiple deeplearning techniques, which construct models are capable of generatingexecution traces with timestamped events. The study sheds light intothe relative strengths of both approaches and raises the prospect of de-veloping hybrid approaches that combine these strengths.
Keywords:
Process mining · Deep learning · Data-driven simulation
An event log is a collection of execution traces of a business process. Each tracein a log consists of a sequence of events and each event consists of a processinstance (case) identifier, an activity label, an activity start and end timestamp,and possibly also the resource who performed the activity and other attributes.A generative model of a business process is a statistical model constructedfrom an event log, which is able to generate traces that resemble those observedin the log as well as other traces of the process. Generative process models haveseveral applications, including anomaly detection [13], predictive monitoring [17],and what-if analysis [3]. Two families of generative models have been studied inthe literature: Data-Driven Simulation (DDS) and Deep Learning (DL) models.DDS models are discrete-event simulation models constructed from an eventlog. Several authors have proposed techniques for discovering DDS models, rang-ing from semi-automated techniques [12] to automated ones [3,14]. A DDS modelis generally constructed by first discovering a process model from an event logand then fitting a number of parameters (e.g. mean inter-arrival rate, branchingprobabilities, etc.) in a way that maximizes the similarity between the tracesthat the DDS model generates and those in (a subset of) the event log. a r X i v : . [ c s . A I] S e p M. Camargo et al.
On the other hand, DL generative models are machine learning models con-sisting of interconnected layers of artificial neurons adjusted based on input-output pairs in order to maximize accuracy. Generative DL models have beenwidely studied in the context of predictive process monitoring [5,11,17,18], wherethey are used to generate the remaining path (suffix) of an incomplete trace byrepeatedly predicting the next event. It has been shown that these models canalso be used to generate entire traces [2].To date, the relative accuracy of these two families of generative processmodels has not been studied, barring a study that compares DL models vsautomatically discovered process models that generate events without times-tamps [16]. This paper fills this gap by empirically comparing these approachesusing five event-logs with varying structural and temporal characteristics. Basedon the evaluation results, the paper discusses the relative strengths and potentialsynergies of these approaches.The paper is organized as follows. Sections 2 and 3 review DDS and DLgenerative modeling approaches, respectively, and introduce the approaches in-cluded in the evaluation. Section 4 presents the evaluation setup, while Section5 discusses the results. Finally, Section 6 concludes and outlines future work.
Business Process Simulation (BPS) is a quantitative process analysis techniquein which a discrete-event model of a process is stochastically executed a numberof times, and the resulting simulated execution traces are used to compute ag-gregate performance measures such as the average waiting times of activities orthe average cycle time of the process [4].Typically, a BPS model consists of a process model enhanced with time andresource-related parameters such as the the inter-arrival time of cases and itsassociated Probability Distribution Function (PDF), the PDFs of each activ-ity’s processing times, a branching probability for each conditional branch in theprocess model, and the resource pool responsible for performing each activitytype in the process model [4]. Such BPS models are stochastically executed bycreating new cases according to the inter-arrival time PDF, and simulating theexecution of each case following the control-flow semantics of the process modeland the following activity execution rules: (i) If an activity in a case is enabled,and there is an available resource in the pool associated to this activity, theactivity is started and allocated to one of the the available resources in the pool;(ii) When the completion time of an activity is reached, the resource allocatedto the activity is made available again. Hence, the waiting time of an activity isentirely determined by the availability of a resource. Resources are assumed tobe eager: as soon as a resource is assigned to an activity, the activity is started.A key ingredient for BPS is the availability of a BPS model that accuratelyreflects the actual dynamics of the process. Traditionally, BPS models are cre-ated manually by domain experts by gathering data from interviews, contextual ata-driven simulation vs deep learning 3 inquiries, and on-site observation. In this approach, the accuracy of the BPSmodel is limited by the accuracy of the process model used as a starting point.Several techniques for discovering BPS models from event logs have been pro-posed [12, 14]. These approaches automate the extraction of the process modelfrom an event log, and then enhance this model with all the simulation parame-ters derived from the event log (e.g. arrival rate). In this paper, we use the termDDS model to refer to a BPS model discovered from an event log.Existing approaches for discovering a DDS from an event log can be clas-sified in two categories. The first category consists of approaches that provideconceptual guidance to discover BPS models. For example, [12] discusses howPM techniques can be used to extract, validate, and tune BPS model parame-ters, without seeking to provide fully automated support. Similarly, [19] outlinesa series of steps to construct a DDS model using process mining techniques.The second category of approaches seek to automate the extraction of simu-lation parameters. For example, [14] proposes a pipeline for constructing a DDSusing process mining techniques. However, in this approach, the tuning of thesimulation model (i.e.. fitting the parameters to the data) is left to the user.In this paper, we use a DDS discovery method, namely Simod tool [3], whichautomates both the construction and the tuning of the DDS model. Simod com-bines several PM techniques to automate the generation, and validation of BPSmodels from an event log. Fig. 1 illustrates the steps of the Simod method:Pre-processing, Processing, Simulation, and Post-processing.Fig. 1: Pipeline of Simod to generate process models.In the
Pre-processing stage, Simod extracts a BPMN model from dataand guarantees its quality and coherence with the event log. The first step isthe
Control Flow Discovery , using the SplitMiner algorithm [1], which is knownfor being one of the fastest, simple, and accurate discovery algorithms. Next,Simod applies
Trace alignment to assess the conformance between the discoveredprocess model and each trace in the input log. The tool provides options forhandling non-conformant traces, via removal, replacement, or repair, to ensureobtain full conformance, which is needed in the following stages.In the
Processing stage, Simod extracts the simulation parameters and as-sembles them in a single BPS model. The extracted parameters correspond to the
Resource pools involved in the process, the probability density function (PDF) of
Inter-arrival times and
Activities durations , and the branching probabilities . Theresource pool is discovered using the algorithm proposed by Song and Van derAalst [15]; likewise, the resources are assigned to the different activities accordingto the frequency of execution. The inter-arrival times and activities durationsPDFs are discovered by fitting a collection of possible distribution functions to
M. Camargo et al. the data series, selecting the one that yields the smallest standard error. Forcalculating the branching probabilities, the tool offers two options: assign equalvalues to each conditional branch or compute the traversal frequencies of the con-ditional branches by replaying the event log on the process model. Finally, oncecompiled all the simulation parameters, these are assembled with the BPMNmodel into a single data structure that can be interpreted by a discrete eventsimulator (e.g., Bimp) in the
Simulation step.In the
Post-processing stage, Simod assesses the similarity between theevent log generated by a simulation and the ground truth log, using a measureof similarity between event logs (ELS), which we introduce in Section 4. Simodthen uses a Bayesian hyperparameter optimizer to explore the search space of allpossible Simod parameter settings, in search for a configuration of parametersthat leads to the highest ELS between the simulated log and a ground-truth log.
In recent years, the use of generative deep learning models has been widelystudied in the field of predictive process monitoring. Multiple proposals [2, 5,17] have demonstrated that such models achieve high accuracy for tasks suchas predicting the next event of a running case (and its timestamp or otherattribbutes such as the resource) as well as predicting the remaining path of anincomplete case.In broad terms, a Deep Learning (DL) model is a network composed of mul-tiple interconnected layers of neurons (perceptrons), which perform non-lineartransformations of data [7]. The main goal of these transformations is trainingthe network to learn the behaviors/patterns observed in the data. Theoretically,the more layers of neurons there are in the system, the more it becomes possibleto detect higher-level patterns in the data thanks to the composition of com-plex functions [9]. In the literature multiple neural network architectures (e.g.,feed-forward, convolutional, autoencoders, etc.) have been used in domains suchas natural language processing or image processing. In the field of predictiveprocess monitoring the most common type of neural network is the RecurrentNeural Networks (RNN) due to the sequential nature of the input event logs.In particular, Evermann et al. [5] proposed an approach to generate the mostlikely remaining sequence of events (suffix) starting from a prefix of an ongoingcase. However, this architecture cannot handle numerical features, and hence itcannot generate sequences of timestamped events. The approaches of Lin et al.and Taymouri et al. [11, 18] also shares this inability to predict timestamps anddurations. An alternative approach by Tax et al. [17] can predict timestamps;however, it lacks flexibility in the management of high dimensional inputs due toone-hot-encode categorical features. As a result, its accuracy deteriorates as thenumber of categorical features increases. In [16], the same authors compare theperformance of several techniques for predicting the next element in a sequenceusing real-life datasets. This latter study addresses the problem of predictingthe next event’s type, but it does not consider the problem of simultaneously ata-driven simulation vs deep learning 5 predicting the next event and its timestamp. Finally, Nolle et al [13] proposea neural network architecture called BINet for real-time anomaly detection inbusiness process executions.In this paper, we use the DeepGenerator method proposed in [2] to traingenerative DL models for the task of generating complete event logs.The DeepGenerator method extends previous approaches for training DLmodels with LSTM architectures, by including dimensionality control techniquessuch as the use of n-grams and embedded dimensions, as well as the explorationof random sampling following probability distribution for the category selectionof the next predicted event. Since this method was designed to generate eventswith a single timestamp, we extended it in this paper to support the predictionof two timestamps. Similarly, we generalized the method and use it to train otherkind of RNN architecture known as GRU to broaden the scope of the evaluationperformed in this paper. Fig. 2 synthesizes the phases and stages for buildingpredictive models with our method.Fig. 2: Phases and steps for building DL modelsIn the
Pre-processing phase , we carry out a
Transformation of the eventlog by generating n-sized sequences of events composed of activities, roles, andtimes. Here, we use encoding and scaling techniques depending on the data typeof the event attribute (i.e., categorical, or continuous). In the case of the categor-ical attributes , i.e., activities and roles, they were encoded using the embeddeddimensions technique. This method helps us to keep the dimensionality low,which enhances the performance of the neural network. The embedded dimen-sions are n-dimensional spaces in which the models can map the activities androles according to their proximity. In the case of continuous attributes , that is,start and end timestamps, they are first relativized and later scaled over a rangeof [0 , . The relativization is carried out by first calculating two features: theevent duration and the time-between-events. The duration of an event (a.k.a.the processing time) is the time difference between its end and its start times-tamp. The time-between-events is the difference between the event’s start andthe end of the immediately preceding event in the same trace (a.k.a. the waitingtime). All relative times are scaled using normalization and log-normalizationdepending on the variability of the times in the event log. Once the features areencoded, the next step is the Sequences creation step. In this step, we extractn-grams from each trace to create the example sequences to train the predictivemodels. One n-gram is generated for each step of the process execution and thisis done for each attribute independently. Hence, we use four independent in-
M. Camargo et al. puts: activity prefixes, role prefixes, relativized event durations, and relativizedtime-between-events.In the
Model Training Phase , one of three possible stacked base architec-tures is selected for training. The network structures vary according to whetheror not they share intermediate LSTM or GRU layers, considering that sometimessharing information can help to differentiate execution patterns. Fig. 3 presentsthe general structure of the defined architectures. (a) Specialized (b) Shared categorical (c) Full shared
Fig. 3: (a) this architecture does not share any information, (b) this architectureconcatenates the inputs related with activities and roles, and shares the firstlayer, (c) this architecture completely shares the first layer.Finally, the
Post-processing Phase includes the mechanisms for generatingcomplete traces from a zero-prefix size. The way of doing this is using continuousfeedback of the model with each newly generated event, until the generation of afinalization event. The category of the next event is selected randomly followingthe predicted probability distribution. This mechanism turns out to be the mostsuitable for the task of generating complete event logs by avoiding getting stuckin the higher probabilities, as was tested in [2].
This section presents an empirical comparison of DDS and DL generative processmodels. The evaluation aims at addressing the following questions: what is therelative accuracy of these approaches when it comes to generating traces ofevents without timestamps? and what is their relative accuracy when it comesto generating traces of events with timestamps?
We evaluated the selected approaches using five event logs that contain bothstart and end timestamps: – The event log of a manufacturing production (MP) process contains the stepsexported from an Enterprise Resource Planning (ERP) system [10]. ata-driven simulation vs deep learning 7 – The event log of a purchase-to-pay (P2P) process is a synthetic log generatedfrom a model not available to the authors. – The event log from an Academic Credentials Recognition (ACR) process ofa Colombian University was gathered from its BPM system (Bizagi). – The W subset of the BPIC2012 event log, which is a log of a loan appli-cation process from a Dutch financial institution. The W subset of this logis composed of the events corresponding to activities performed by humanresources (i.e. only activities that have a duration). – The W subset of the BPIC2017 event log, which is an updated version ofthe the BPIC2012 log. We carried out the extraction of the W-subset byfollowing the recommendations reported by the winning teams participatingin the BPIC 2017 challenge .Table 1 summarizes the characteristics of these logs. The BPIC2017W andBPIC2012W logs have the largest number of traces and events, while the MP,P2P and ACR have less traces but a higher average number of events per trace. Event log Num.traces Num.events Num.activities Avg.activitiesper trace Max.activitiesper trace Meanduration Max.duration
MP 225 4953 26 22 177 20.6 days 87 days 10 hoursP2P 608 9119 21 14.9 44 21.5 days 108 days 7 hoursACR 954 6870 18 7.2 23 14.9 days 135 days 19 hoursBPIC2012W 8616 59302 6 6.88 74 8.9 days 85 days 20 hoursBPIC2017W 30276 240854 8 7.9 65 12.7 days 286 days 1 hour
Table 1: Event logs description
To measure the accuracy of a generative process model, we use it to generate anevent log (multiple times) and we measure the average similarity between thegenerated logs and a ground-truth event log. To this end, we define four measuresof similarity between pairs of logs: Control-Flow Log Similarity (CFLS), MeanAbsolute Error (MAE) of cycle times, Event Log Similarity (ELS), and Earth-Mover’s Distance (EMD) of the histograms of activity processing times.CFLS is defined based on a measure of distance between pairs of traces: onetrace coming from the original event log and the other from the generated log. We The log is provided as part of the Fluxicon Disco tool – https://fluxicon.com/ https://doi.org/10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f https://doi.org/10.4121/uuid:5f3067df-f10b-45da-b98b-86ae4c7a310b M. Camargo et al. first convert each trace into a sequence of activity (i.e. we drop the timestampsand other attributes). In this way, a trace becomes a sequence of symbols (i.e. astring). We then measure the difference between two traces using the Damerau-Levenshtein distance, which is the minimum number of edit operations necessaryto transform one string (a trace in our context) into another. The supported editoperations are insertion, deletion, substitution, and transposition. Transpositionsare allowed without penalty when two activities are concurrent, meaning thatthey appear in any order, i.e. given two activities, we observe both AB and BAin the log. Next, we normalize the resulting Damerau-Levenshtein distance bydividing the number edit operations by the length of the longest sequence. Wethen define the control-flow trace similarity as the one minus the normalizedDamerau-Levenshtein distance. Given this trace similarity notion, we pair eachtrace in the generated log with a trace in the original log, in such a way that thesum of the trace similarities between the paired traces is maximal. This pairingis done using the Hungarian algorithm for computing optimal alignments [8].Finally, we define the CFLS between the real and the generated log as the averagesimilarity of the optimally paired traces.The cycle time MAE measures the temporal similarity between two logs. Theabsolute error of a pair of traces T1 and T2 is the absolute value of the differencebetween the cycle time of T1 and that of T2. The cycle time MAE is the mean ofthe absolute errors over a collection of paired traces. Like for the CFLS measure,we use the Hungarian algorithm to pair each trace in the generated log with acorresponding trace in the original log.The cycle time MAE is a rough measure of the temporal similarity betweenthe traces in the original and the generated log. It does not take into accountthe timing of the events in a trace – only the cycle time of the full trace. Tocomplement the cycle time MAE, we use the Earth Mover’s Distance (EMD)between the normalized histograms of the mean durations of the activities inthe ground-truth log vs the same histogram computed from the generated log.The EMD between two histograms H1 and H2 is the minimum number of unitsthat need to be added to, removed to, or transferred across columns in H1 inorder to transform it into H2. The EMD is zero if the observed mean activitydurations in the two logs are identical, and it tends to one the more they differ.The above measures focus either on the control-flow or on the temporalperspective. To complement them, we use the a measure that combines bothperspective, namely the ELS as defined in [3]. This measure is defined in the sameway as CLFS above, except that it uses a distance measure between traces thattakes into account both the activity labels and the timestamps of activity labels.This distance measure between traces is called Business Process Trace Distance(BPTD). The BPTD measures the distance between traces composed of eventsthat occur in time intervals. This metric is an adaptation of the CFLS metricthat, in the case of label matching, assigns a penalty based on the differencesin times. BPTD also supports parallelism, which commonly occurs in businessprocesses. We have called ELS to the generalization of the BPTD to measurethe distance between two event logs using the Hungarian algorithm. ata-driven simulation vs deep learning 9
The aim of the evaluation is to compare the accuracy of DDL models vs DLmodels discovered from event logs. Fig. 4 presents the pipeline we followed.Fig. 4: Experimental pipelineWe used the holdup method with a temporal split criterion to divide the eventlogs into two folds: 70% for training and 30% for testing. Next, we use the trainingfold to train the DDS and the DL models. We use Simod’s hyperparameteroptimizer to tune the DDS model. The optimizer is set to explore 50 parameterconfigurations with five simulation runs per configuration, using the first 80%of the training fold to construct candidate DDS models and the remaining 20%for validation. We retained the DDS model that gave the best results on thevalidation sub-fold in terms of ELS averaged across the five runs. Next, for eachmodel family (LSTM and GRU) we apply random search for hyperparameteroptimization. Like in the DDS approach, we explore 50 random configurationswith five runs each, using 80% of the training fold for model construction and20% for validation. The above led us to one DDS, one LSTM and one GRU modelper log. We then generated ten logs per retained model. To ensure comparability,each generated log was of the same size (number of traces) as the testing fold ofthe original log. For each generated log, we then compare it to the testing foldusing the ELS, CFLS, EMD and MAE measures defined above. To smooth outstochastic variations, we report the mean of each of these measures across the10 logs generated from each model.
Table 2 presents the evaluation results. The
Event-log column identifies the eval-uated log, the column
Model Family refers to the type of model (DDS, LSTM,GRU). The ELS, CFLS, MAE and EMD columns present the accuracy measures.
Note that ELS and CFLS are similarity measures (higher is better) whereas MAEand EMD are error/distance measures (lower is better).
Type of log Event log Model family ELS CFLS MAE EMD
Synthetic P2P GRU 0.63
LSTM
Real ACR GRU 0.58 0.58
LSTM 0.56 0.56 369663.50 2.58DDS
Real BPI2017W GRU 0.83
LSTM
Table 2: Evaluation resultsBoth DDS and DL approaches gave similar results on the artificial log (P2P).This may be due to the fact that this log has a predictable behavior, which isequally well captured by both families of approaches. In three of the four real-lifelogs (MP, ACR, and BPI2012W), the DDS model has higher accuracy in terms ofcontrol-flow similarity, particularly in the ACR log. However, in the BPI2017Wlog, the DL models yielded considerably higher control-flow similarity. This canbe explained by the fact that this log is much larger than the others, and DLmodels generally excel when fed large amounts of samples. Turning our attentionto temporal similarity, we note that DL models led to the best MAE resultsacross all event logs. An example of this can be observed in the BPI2012Wlog in which the best generative model obtained half the MAE than the bestsimulation model. In the case of EMD, there is no clear winner. All approachesare able to accurately reproduce the distribution of processing times of activities.The results indicate that DDS models perform well when it comes to cap-turing the occurrence and order of activities (control-flow), event with smallertraining datasets. However, the Deep Learning models are more accurate whenit comes to capturing the temporal perspective and, as expected, they performparticularly well for the largest dataset.A possible explanation is that event logs of business processes (at least theones included in this evaluation) follow to certain normative pathways that can ata-driven simulation vs deep learning 11 be captured sufficiently well by automatically discovered process models. On theother hand, the waiting times in these event logs are not adequately capturedby DDS models. DL models on the other hand are able to find patterns in theobserved event timestamps. Since the EMD values are similar for DDS and DLmodels, we conclude that the differences in temporal accuracy between these twotypes of models stems from the fact that DL models are better able to predictthe waiting times of activities (rather than the processing times). The inabilityfor DDS models to accurately capture the waiting times can be attributed tothe fact that these models rely on the assumption that the waiting times canbe fully explained by the availability of resources (i.e. resource contention is thesole cause of waiting times) and that they operate under the assumption of eagerresources as discussed in Section 2 (i.e. resources start an activity as soon as itis allocated to them). DL models on the other hand simply try to find the bestpossible fit to the observed waiting times.The results of this paper are restricted to the five event-logs used for theevaluation. To obtain results with a considerable statistical significance is neededa larger volume of event logs with the required characteristics. Similarly, thiswork does not include all possible DL architectures applied in the context ofbusiness processes; the results are restricted to LSTM and GRU models.
In this paper, we compared the accuracy of two approaches to discover genera-tive models from event logs: Data-Driven Simulation (DDS) and Deep Learning(DL). The results suggest that DDS models are suitable for capturing the se-quence of activities (or possibly other categorical attributes) of a process. Onthe other hand, DL models outperform DDS models when predicting the timingof activities, specifically the waiting times between activities.This observation raises the prospect of combining these approaches into hy-brid techniques that take advantage of their relative strengths. In such hybridapproaches, the DDS model would capture the control-flow perspective, whilethe DL model would capture the temporal dynamics, particularly waiting times.The DSS model would also provide an interpretable model that users can changein order to define “what-if” scenarios, e.g. a what-if scenario where an activity isremoved or a new activity is added. The challenges here are: (i) how to integratethe DDS model with the DL model; and (ii) how to incorporate the informationin a what-if scenario into a DL model. A possible direction to tackle the latterchallenge is to adapt existing techniques to incorporate domain knowledge (e.g.the fact that an activity has been deleted) into the output of a DL model [6].
Reproducibiltiy package
The scripts and datasets required to reproduce thereported evaluation can be found at: https://github.com/AdaptiveBProcess/DDSvsDL
Acknowledgments
Research supported by the European Research Council(PIX project).
References
1. Augusto, A., Conforti, R., Dumas, M., Rosa, M.L.: Split miner: Discovering accu-rate and simple business process models from event logs. In: 2017 IEEE Interna-tional Conference on Data Mining (ICDM). pp. 1–10. IEEE (2017)2. Camargo, M., Dumas, M., González-Rojas, O.: Learning Accurate LSTM Modelsof Business Processes. In: Proceedings of BPM’2019. LNCS, vol. 168, pp. 286–302.Springer (2019)3. Camargo, M., Dumas, M., González-Rojas, O.: Automated discovery of businessprocess simulation models from event logs. Decis Support Syst , 113284 (2020)4. Dumas, M., La Rosa, M., Mendling, J., Reijers, H.A.: Fundamentals of BusinessProcess Management. Springer, second edition edn. (2018)5. Evermann, J., Rehse, J.R., Fettke, P.: Predicting process behaviour using deeplearning. Decis Support Syst , 129–140 (2017)6. Francescomarino, C.D., Ghidini, C., Maggi, F.M., Petrucci, G., Yeshchenko, A.: Aneye into the future: Leveraging a-priori knowledge in predictive business processmonitoring. In: Proceedings of BPM’2017. Lecture Notes in Computer Science, vol.10445, pp. 252–268. Springer (2017)7. Hao, X., Zhang, G., Ma, S.: Deep Learning. Int J Semant Comput , 417–439(2016)8. Kuhn, H.W.: The Hungarian Method for the assignment problem. Nav Res LogistQ , 83–97 (1955)9. Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature , 436–444 (2015)10. Levy, D.: Production analysis with process mining technology (2014).https://doi.org/10.4121/uuid:68726926-5ac5-4fab-b873-ee76ea41239911. Lin, L., Wen, L., Wang, J.: MM-Pred: A Deep Predictive Model for Multi-attributeEvent Sequence. In: Proceedings of the 2019 SIAM International Conference onData Mining. pp. 118–126. Society for Industrial and Applied Mathematics (2019)12. Martin, N., Depaire, B., Caris, A.: The Use of Process Mining in Business ProcessSimulation Model Construction. Bus Inf Syst Eng , 73–87 (2016)13. Nolle, T., Seeliger, A., Mühlhäuser, M.: BINet: Multivariate Business ProcessAnomaly Detection Using Deep Learning. In: Business Process Management. BPM2018. LNCS, vol. 11080, pp. 271–287. Springer (2018)14. Rozinat, A., Mans, R.S., Song, M., van der Aalst, W.M.P.: Discovering simulationmodels. Inform Syst , 305–327 (2009)15. Song, M., van der Aalst, W.M.P.: Towards comprehensive support for organiza-tional mining. Decis Support Syst46