A Python Extension to Simulate Petri nets in Process Mining
M. Pourbafrani, Sandhya Vasudevan, Faizan Zafar, Yuan Xingran, Ravikumar Singh, Wil M. P. van der Aalst
AA Python Extension to Simulate Petri nets inProcess Mining (cid:63)
M. Pourbafrani , Sandhya Vasudevan , Faizan Zafar ,Yuan Xingran ,Ravikumar Singh and Wil M. P. van der Aalst Chair of Process and Data Science, RWTH Aachen University, Germany { mahsa.bafrani,wvdaalst } @pads.rwth-aachen.de RWTH Aachen University { sandhya.vasudevan,faizan.zafar,xingran.yuan,ravikumar.singh } @rwth-aachen.de Abstract.
The capability of process mining techniques in providing ex-tensive knowledge and insights into business processes has been widelyacknowledged. Process mining techniques support discovering processmodels as well as analyzing process performance and bottlenecks inthe past executions of processes. However, process mining tends to be“backward-looking” rather than ”forward-looking” techniques like simu-lation. For example, process improvement also requires ”what-if” analy-ses. In this paper, we present a Python library which uses an event logto directly generate a simulated event log, with additional options forend-users to specify duration of activities and the arrival rate. Since thegenerated simulation model is supported by historical data (event data)and it is based on the Discrete Event Simulation (DES) technique, thegenerated event data is similar to the behavior of the real process.
Keywords: process mining, simulation, discrete event simulation, eventlog, automatic simulation model generation.
Process mining tools provide unique capabilities to diagnose business processesexisting within organizations (e.g., in transaction logs or audit trails) includingdiscovering the running processes, as well as deviations and bottlenecks that oc-cur or exist in the current state of the processes [1]. In all of the proposed toolsfor simulation in process mining, interaction with the user and user knowledgeis an undeniable requirement for designing and running the simulation models.Moreover, most of the approaches are dependent on external simulation tools.For instance, in [2], the proposed business process simulation technique is basedon the BPMN model. All the simulation parameters with the BPMN model areput into a simulation tool such as BIMP for the simulation step. [3] providesa comprehensive platform for modeling stochastic Petri nets, however, the con-nection to process mining is missing. the In [4], the created simulation model is (cid:63)
Acknowledgments Funded by the Deutsche Forschungsgemeinschaft (DFG, German ResearchFoundation) under Germany’s Excellence Strategy–EXC-2023 Internet of Production – 390621612.We also thank the Alexander von Humboldt (AvH) Stiftung for supporting our research. a r X i v : . [ c s . S E ] F e b Mahsa Pourbafrani et al. based on the CPN tool which requires users to have knowledge of discrete eventsimulation as well as
Standard Machine Language (SML) to define functions andcapture the output as an event log [5]. In [6] an external tool, i.e., ADONIS forsimulating the discovered model and parameters are used. It should be noted
Generating the Simulated Log Process Mining
Discrete Event Simulation
Event Log Calculate arrival rate ( Statistical analysis )Process discovery(Petri net)Performance analysis (Activity duration)
Generate new cases
Generate the possible flow (trace, next possible activity)
Process the possible activity
Capture the events for the case and activity
Convert the simulation clock time to the timestamp Simulated Event Log
Fig. 1: The general framework for discrete event simulation in process mining. The automatic gener-ation of simulation models and the corresponding simulated event logs is possible by starting with anevent log, extracting the process model and the performance information, generating random cases,and finally converting the processed activities in the form of events. The user annotations indicatethe options for the user to simulate the process with user-defined parameters. that oftentimes the user does not need to have in-depth knowledge of the pro-cess so as to simulate it which holds for the most of commercial tools such as
Protos , Any Logic and
ARENA . For instance, when the user only needs to knowhow the process will behave if the average arrival rate increases to 5 minutes,i.e., every 5 minutes a new case arrives.In process mining, the above-mentioned requirements can be addressed by theconcept of Discrete Event Simulation (DES) [7]. DES for business processes hasbeen developed in
Java as a plugin of
ProM [8]. However, custom options suchas the ability to change the duration of activities for future performance analysesare missing. Approaches such as in [9] uses the same idea in Java, including somedrawbacks, e.g., a fixed duration for case generation step. The generated casesdo not have any time overlap, which is not the case in reality. Work such as[10] tries to generate a business simulation model for business processes whichrelies on the user domain knowledge. [11] describes a range of modeling tasksthat should be considered when creating a realistic business process simulationmodel.Existing process mining tools provide users with a visual representation ofprocess discovery and performance analyses using event data in the form of eventlogs. Therefore, an approach is needed to play out reality and generate the exactbehavior which makes further analyses in process mining possible. Moreover,the option to extend the library as an open-source tool is easily provided. Useroptions to add capacity to the activities and to extend the case production fordifferent times of the day and week can be implemented.Research work such as [12] and [13] use aggregated simulation which is usefulfor what-if analyses in a high-level decision-making scenario [14]. The
PMSD toolrepresents the aggregated approach and generates a simulation model at a higherlevel of detail [15].In this paper, we introduce an easy-to-use open-source Python-based appli-cation that connect the provided process mining environment in Python
PM4Py
Python Extension to Simulate Petri nets in Process Mining 3 [16] to the general simulation techniques in Python,
Simpy . The latter libraryis used for discrete event simulation and handles the required system clock inDES. The automatically designed simulation model can be configured with user-defined duration for the activities and arrival rate. The final output is an eventlog based on the given number of cases that can be used further for processmining analyses. The designed framework of the tool is shown in Fig. 1. It isdesigned on the basis of three main modules; process mining, simulation, andtransformation of the generated events into an event log. Event logs comprise events where each event refers to a case (process instance),an activity , a timestamp , and any number of additional attributes (e.g., costs,resources, etc.). A set of events forms an event log which can be used in processmining analyses. As shown in Fig. 1, our approach starts with applying pro-cess mining techniques on the original event log. Therewith a process model isdiscovered in the form of a Petri net which presents possible flows of activitiesfor the cases. Subsequently, performance analyses provide the case arrival rateincluding the business hours and the average duration of the activities. Thisinformation makes the automatic generation of process instances based on thepast executions of processes possible.
Start User configuration provided?
Start simulatingYes No
Discover process model (Petri net) Discover arrival rate and activity duration Generate a case No EndPick an available marking from Petri netCapture the event (case id, activity, timestamp) Required cases generated?Import event log
Update changes
Simulate the corresponding activity
Yes
Is it the final marking? YesNo
Fig. 2: The flowchart of the integrated discreteevent simulation of the processes using processmining. Each activity runs if it is available andthe clock of the simulation gets updated for everynew event. New events are a newly arrived case,the end of processing an activity for a case, or thestart of the processing of an activity for a case.
We aim to provide a simulationmodel and the corresponding simu-lated event log as close to reality aspossible. To do so, we perform the fol-lowing preprocessing steps in the pro-cess mining module: – Process discovery: • Maximum length of traces :The presence of loops in theprocess models (Petri nets)makes the generation of longunrealistic traces possible. Byidentifying and replacing themaximum length of traces, welimit the possibility of the ex-ecution of unrealistic loops for the simulated cases. – Performance analyses: • Arrival rate calculation : The business hours are considered by default incalculating the average arrival rate. Moreover, we learn the inter-arrivaltime distribution from the actual arrival times. The detected distributionis used in the simulation step. https://simpy.readthedocs.io Mahsa Pourbafrani et al. • Activity duration : By removing outliers from the set of duration for eachactivity, we provide more robust values for the duration of activities.Using the distribution of activities’ duration, we implicitly consider the averageduration of resources’ time without extracting the resource pool. This aggregatedcalculation includes the behavior of resources for handling each activity.Next is the simulation module in which we generate new cases. In extractingthe arrival rate of cases, i.e., the duration of time for a new case to arrive, weinclude the business hours in the calculation of the arrival rate to obtain anaccurate value. The next step is to discover how the cases are handled in theprocess w.r.t. the service time of each activity and the possible flow of activi-ties that each case can take. Based on the presence of the start and completetimestamps, the value of the average duration of each activity is captured. Thediscovered Petri net also is used for generating a possible flow of activities. Theprovided user options to interact with and modify the simulation process are thefollowing functions: – Activity duration generates the random values based on the extracted valuesfor each activity and the corresponding distribution. The user is able tochange the parameters of the distribution . – Arrival rate uses a normal distribution for generating new cases and the useris able to change the average arrival rate for the simulated log. – Case generator produces random cases based on the provided number ofcases by the user. It determines the terminating point of the simulation.The final module is designed to transform the simulated events for the generatedcases into event logs. The discrete event simulation clock is converted to the realtimestamp and each activity is recorded for the cases in the real timestamp. Theflow chart of the simulation module of our tool is shown in Fig. 2. After each newgenerated case, it checks the condition whether the number of cases provided bythe user is met. Accordingly, it follows up with processing the picked markingfrom the Petri net. Either the provided outputs by the process mining moduleor user parameters are used to start the simulation. By selecting the availableactivity from the Petri net, the simulation module checks whether the previousprocess of the activity has finished. In the last step, after performing each possibleevent (generating a new case or processing of an activity) the simulation clockgets updated and the data is captured. Since the simulation technique considersthe capacity of each activity, the concept of queuing is implicitly covered in thesimulated event log. When an activity with full capacity, i.e., processing othercases, is selected for the current case, the case is in the waiting state which isshown in the performance analyses of the event log.
The source code of our tool, a tutorial, and a screen-cast are publicly avail-able. The tool has been used in multiple academic projects to simulate a https://github.com/mbafrani/AutomaticProcessSimulation Python Extension to Simulate Petri nets in Process Mining 5 check ticketdecide examine casuallyexamine thoroughlypay compensation register requestreinitiate requestreject request 1 case end check ticketdecideexamine casually examine thoroughlypay compensation register requestreinitiate requestreject request 1 Fig. 3: The discovered process model of the example event log using Petri net notation. It includes8 unique activities and represents the process of handling requests in an organization (a).The dis-covered process model of the simulated event log using Petri net notation. Our tool generates thesimulated event log directly from the original event log, which captures both time and activity flowfeatures of the original process (b). process model in different situations and generate different event logs. For in-stance, for the purpose of time series analyses, different arrival rates for thesame process have been selected and the tool event logs are generated. Weuse a sample case study to demonstrate the steps and usability of our tool.
Fig. 4: Part of the simulated event log for the ex-ample event log which is generated in the .csv for-mat. It includes the main attributes of an eventlog, case id, activity, and timestamp.
Figure 3(a) shows a sample processmodel of the example event log in theform of a Petri net. We use the sam-ple event log and simulate the processfor 1000 cases. Using the same pro-cess discovery algorithm for the sim-ulated event log result in the samemodel including concurrences in themodel as shown in Fig. 3(b). The dis-covered model shows that our tool isable to mimic the process and simu-late the model including the time aspects of the process. Part of the simulatedlog is shown in Fig. 4. The simulated event log has the main attributes of anevent log. It captures the case id which is increased incrementally to the definednumber by the user, activity names, and the corresponding complete time as timestamp . Techniques for past analyses of processes in organizations are well-supported inexisting academic and commercial process mining tools. However, future analysesfor business processes are not fully covered in the current tools. Commonlyused options either need knowledge of simulation techniques and modeling, highinteraction with users or are not accurate enough since they are not supportedby real event data. In this paper, we presented the tool which directly uses theevent data of a process in the form of an event log and simulates the processwith the automatically extracted values as well as user-defined input. The tool
Mahsa Pourbafrani et al. is designed to simulate the processes in different scenarios. Since the simulationmodule is based on the discrete event simulation technique, the simulated eventlog includes the same behavior as the real event log.
References
1. W. M. P. van der Aalst,
Process mining - data science in action, Second Edition .Springer, 2016.2. M. Camargo, M. Dumas, and O. G. Rojas, “Simod: a tool for automated discoveryof business process simulation models,” pp. 139–143, 2019.3. S. Baarir, M. Beccuti, D. Cerotti, M. De Pierro, S. Donatelli, and G. Franceschinis,“The greatspn tool: recent enhancements,”
SIGMETRICS Performance EvaluationReview , vol. 36, pp. 4–9, 03 2009.4. A. Rozinat, R. S. Mans, M. Song, and W. M. P. van der Aalst, “Discoveringsimulation models,”
Inf. Syst. , vol. 34, no. 3, pp. 305–327, 2009.5. A. V. Ratzer, L. Wells, H. M. Lassen, M. Laursen, J. F. Qvortrup, M. S. Stissing,M. Westergaard, S. Christensen, and K. Jensen, “CPN tools for editing, simulating,and analysing coloured Petri nets,” in
Applications and Theory of Petri Nets 2003,24th International Conference, ICATPN 2003, Eindhoven, The Netherlands, June23-27, 2003, Proceedings , pp. 450–462, 2003.6. B. Gawin and B. Marcinkowski, “How close to reality is the as-is business processsimulation model?,”
Organizacija , vol. 48, no. 3, pp. 155 – 175, 2015.7. W. M. P. van der Aalst, “Process mining and simulation: a match made in heaven!,”in
Proceedings of the 50th Computer Simulation Conference, SummerSim 2018,Bordeaux, France, July 09-12, 2018 , pp. 4:1–4:12, ACM, 2018.8. B. F. van Dongen, A. K. A. de Medeiros, H. Verbeek, A. Weijters, and W. M.van der Aalst, “The ProM framework: A new era in process mining tool support,”in
International Conference on Application and Theory of Petri Nets , pp. 444–454,Springer, 2005.9. A. Rogge-Solti and M. Weske, “Prediction of business process durations using non-markovian stochastic Petri nets,”
Inf. Syst. , vol. 54, pp. 1–14, 2015.10. L. Pufahl and M. Weske, “Extensible BPMN process simulator,” in
Proceedings ofthe BPM Demo Track and BPM Dissertation Award co-located with 15th Interna-tional Conference on Business Process Modeling (BPM) , 2017.11. N. Martin, B. Depaire, and A. Caris, “The use of process mining in business processsimulation model construction - structuring the field,”
Bus. Inf. Syst. Eng. , vol. 58,no. 1, pp. 73–87, 2016.12. M. Pourbafrani, S. J. van Zelst, and W. M. P. van der Aalst, “Scenario-basedprediction of business processes using system dynamics,” in
On the Move to Mean-ingful Internet Systems: OTM 2019 Conferences - Confederated International Con-ferences: CoopIS, ODBASE, C&TC 2019, Rhodes, Greece, October 21-25, 2019,Proceedings , pp. 422–439, 2019.13. M. Pourbafrani, S. J. van Zelst, and W. M. P. van der Aalst, “Supporting auto-matic system dynamics model generation for simulation in the context of processmining,” in
Business Information Systems - 23rd International Conference, BIS2020, Colorado Springs, CO, USA, June 8-10, 2020, Proceedings , pp. 249–263,2020.14. M. Pourbafrani, S. J. van Zelst, and W. M. P. van der Aalst, “Supporting deci-sions in production line processes by combining process mining and system dy-namics,” in
Intelligent Human Systems Integration 2020 - Proceedings of the 3rd
Python Extension to Simulate Petri nets in Process Mining 7
International Conference on Intelligent Human Systems Integration (IHSI 2020):Integrating People and Intelligent Systems, February 19-21, 2020, Modena, Italy ,pp. 461–467, 2020.15. M. Pourbafrani and W. M. P. van der Aalst, “PMSD: Data-driven simulation inprocess mining,” in
Proceedings of the Dissertation Award, Doctoral Consortium,and Demonstration Track at BPM 2020 co-located with 18th International Confer-ence on Business Process Management, BPM 2020 , 2020.16. A. Berti, S. J. van Zelst, and W. M. P. van der Aalst, “Process mining forpython (pm4py): Bridging the gap between process- and data science,”