Event Stream-Based Process Discovery using Abstract Representations
Sebastiaan J. van Zelst, Boudewijn F. van Dongen, Wil M.P. van der Aalst
aa r X i v : . [ c s . D B ] A p r Event Stream-Based Process Discovery usingAbstract Representations
S.J. van Zelst ∗ , B.F. van Dongen, and W.M.P. van der AalstDepartment of Mathematics and Computer ScienceEindhoven University of TechnologyP.O. Box 513, 5600 MB Eindhoven, The NetherlandsApril 27, 2017 Abstract
The aim of process discovery, originating from the area of process min-ing, is to discover a process model based on business process executiondata. A majority of process discovery techniques relies on an event log asan input. An event log is a static source of historical data capturing the ex-ecution of a business process. In this paper we focus on process discoveryrelying on online streams of business process execution events. Learningprocess models from event streams poses both challenges and opportuni-ties, i.e. we need to handle unlimited amounts of data using finite memoryand, preferably, constant time. We propose a generic architecture that al-lows for adopting several classes of existing process discovery techniquesin context of event streams. Moreover, we provide several instantiations ofthe architecture, accompanied by implementations in the process miningtool-kit ProM . Using these instantiations, we evaluate several dimen-sions of stream-based process discovery. The evaluation shows that theproposed architecture allows us to lift process discovery to the streamingdomain. Process mining [2] aims at understanding and improving business processes.The field consists of three main branches, i.e. process discovery , conformancechecking and process enhancement . Process discovery aims at discovering aprocess model based on event data. Conformance checking is concerned withassessing whether a process model and event data conform to each other in terms ∗ Corresponding Author: [email protected] http://promtools.org
1f possible behaviour. Process enhancement is concerned with improvement ofprocess models based on knowledge gained from event data, e.g. a process modelis extended with performance diagnostics based on event data.Several process discovery algorithms exist [4, 5, 24, 30, 45, 48]. These algo-rithms all use an event log as an input. An event log is a static data sourcedescribing sequences of executed business process activities recorded over a his-torical time-span. As the number of events recorded for operational processesis growing tremendously every year, so does the average event log size. Con-ventional process discovery techniques are not able to cope with such large datasets, i.e. they fail when the data does not fit main memory. Moreover, eventsare being generated at high rates, e.g. consider data originating from sensornetworks, mobile devices and e-business applications. Since existing processdiscovery techniques use static data, they are not able to capture the dynamicsof such event streams in an adequate manner.In this paper, we focus on process discovery using streams of business processevents, i.e. event streams , rather than event logs. Applying process discoveryon event streams allows us to gain insights in the underlying business process ina live fashion. It furthermore allows us to deal with situations where: eventlogs are too large to fit main memory, there is no time to access event datacontinuously, i.e. real-time constraints and recent behaviour is more impor-tant, i.e. concept drift. A large class of existing process discovery algorithmstransforms the event log into an abstract representation , i.e. an abstraction ofthe event log, which is subsequently used to discover a process model. To adoptthese algorithms in a streaming context, it suffices to approximate the abstractrepresentation based on the event stream. Using abstract representations hasseveral advantages: ; We reuse existing techniques by predom-inantly focusing on learning abstract representations from event streams. ; Once we design and implement a method for approximating acertain abstract representation, any (future) algorithm using the same abstractrepresentation is automatically ported to event streams. ; Insome cases, laws and regulations dictate that we are not allowed to store allevent data. Some abstract representations ignore large parts of the data, ef-fectively storing a summary of the actual event data, and therefore comply to anonymity regulations.We present the
Stream-Based Abstract Representation (S-BAR) architecturethat describes this mechanism in a generic way (Fig. 1). An event stream S represents an (in)finite sequence of events , emitted over time. An event isrepresented by a ( c, a )-pair, stating that activity a is executed in context ofcase c . We maintain a data structure ( D T ) that represents the past behaviouremitted onto stream S . Each time a new event arrives the data structure is keptup to date by updating its previous state based on the newly received event( δ D T ). From the data structure an algorithm-specific abstract representation ( A T ) is deduced ( λ A T D T ). After learning the abstract representation, we reuseexisting translations borrowed from conventional process discovery algorithmsto return a process model ( γ A T ). 2 .., ( c, a ) , ... Event Stream S δ D T Data Struct. Update Function Data Structure D T λ A T D T Abstr. Representation Mapping A T Abstract Representation γ A T Disc. AlgorithmProcess Model M Figure 1: Schematic overview of the
S-BAR architecture .The S-BAR architecture is instantiated by designing a data structure, adata structure update mechanism and a data structure translation function.The actual implementation of the data structure and related update functionsinfluences the behaviour described by the discovered process model, e.g. using atime-decaying data structure versus a data structure that approximates the mostfrequent cases on the stream. Several instantiations of the architecture havebeen implemented in the process mining toolkits ProM [19] and RapidProM [3,10]. Using these implementations we conduct empirical experiments w.r.t. thebehaviour of these algorithms in an event stream setting. The experiments showthat the algorithms are able to capture the behaviour reflected by the eventstream. Moreover, the experiments show that memory usage and processingtimes of the algorithms have non-increasing trends.The remainder of this paper is organized as follows. In Section 2, we presentbackground information regarding business processes and process discovery. InSection 3, we present event streams and the notion of event stream based processdiscovery. In Section 4, we introduce the S-BAR architecture. In Section 5, weprovide several instantiations of the architecture. In Section 6, we present anempirical evaluation of several instantiations of the architecture. In Section 7,we present related work. In Section 8, we discuss general challenges in eventstream based process discovery. Section 9 concludes the paper.
In this section we present general notation used throughout the paper and back-ground concepts regarding business processes and process discovery. N denotes the set of positive integers, N includes 0. A multiset B over set X is a function B : X → N . We write a multiset as [ e k , e k , ..., e k n n ], wherefor 1 ≤ i ≤ n we have e i ∈ X , k i ∈ N and e k i i ≡ B ( e i ) = k i . If for element e , B ( e ) = 1, we omit its superscript. If for element e , B ( e ) = 0, we omit e fromthe multiset notation. An empty multiset is denoted as [ ]. Element inclusionapplies to multisets, i.e. if e ∈ X and B ( e ) > e ∈ B .A sequence σ of length n relates positions to elements e ∈ X , i.e. σ : { , , ..., n } → X . An empty sequence is denoted as ǫ . We write every non-emptysequence as h e , e , ..., e n i . The set of all possible sequences over a set X is3igure 2: BPMN model of a loan application process (adopted from [21]).denoted as X ∗ . We write concatenation of sequences σ and σ as σ · σ .Let X, Y, Z and Z ′ be sets and let f : X → Y and g : Y → Z . Functioncomposition of f and g is defined as g ◦ f : X → Z , with x g ( f ( x )) for x ∈ X .Moreover, given h : Z → Z ′ we write h ◦ g ◦ f for h ◦ ( g ◦ f ), i.e. h ◦ g ◦ f : X → Z ′ ,with x h ( g ( f ( x ))) for x ∈ X . Business processes represent the execution of related business activities leadingto a business goal. Consider a bank offering loans to its customers. A businessgoal of the bank is to accept, reject or cancel a loan application. The bank’semployees and its enterprise information system execute activities to achieve4his goal, e.g. by checking a client’s credit history and assessing the loan risk.A business process P defines a set of sequences over a set of activities A , i.e. P ⊆ A ∗ . If σ ∈ P then the sequence of business activities σ leads to a businessgoal and belongs to the behaviour of P . In this paper, we assume the executionof activities to be atomic and abstract from data attributes such as resource,time-stamp etc. Hence, we only consider the sequential ordering of activities(the control-flow perspective ). U P denotes the universe of business processes.A process model M represents a business process and, like a process, definesa set of sequences over a set of activities A , i.e. M ⊆ A ∗ . U M denotes theuniverse of process models. In this paper, we consider process models that de-scribe behaviour in a deterministic manner, e.g. Petri nets [36], BPMN [38] andworkflow nets [1]. Consider the BPMN model of a loan application handlingprocess in Fig. 2. It describes that after an application is received, the firstactivity to be executed is “Check application completeness” . Depending uponthe completeness of the application, the corresponding form is “Returned backto the applicant” , or, the client’s “credit history is checked” and subsequentlya “loan risk assessment” is performed. The two aforementioned activities canbe executed concurrently with the “appraise property” activity. An “eligibilityassessment” of the loan is performed, eventually leading to a rejection , cancel-lation or approval of the loan.Today’s information systems track the execution of business processes withina company. Such systems store the execution of activities in context of a case ,i.e. an instance of the process. The data stored by the information system isoften in the form of an event log . Consider Table 1 as an example. The executionTable 1: Fragment of an event log. Case Activity Resource Time-stamp ... ... ... ... ... ... ... ...
5f an activity in context of a case, e.g.
Approve application executed for case , is referred to as an event . A sequence of events, e.g. the sequence of eventsrelated to case , h Check application form completeness, Check credit history,..., Approve application i , is referred to as a trace (written h c , c , ..., a i whenusing abbreviated activity names).An event log L is a multiset of sequences over a set of activities A , i.e. L : A ∗ → N , and describes the execution of some P ∈ U P . U L denotes theuniverse of event logs. An event log is a sample of the underlying process.Therefore, there might exist process behaviour that is not present in the eventlog e.g., caused by parallelism. In such case an event log is incomplete . Theremight also exist traces in the event log that are not part of the process, i.e. noisy traces. Noisy traces can be caused by faulty execution of the process,incomplete specifications or technical issues such as incorrect logging, systemerrors and mixed time granularity. The goal of Process discovery is to discover a process model based on an eventlog. Several process discovery algorithms exist [4, 5, 24, 30, 45, 48]. These algo-rithms differ in terms of their underlying computational schemes and data struc-tures as well as their resulting process modeling formalism. We refer to [2,20,44]for a detailed overview of process discovery algorithms.A process discovery algorithm γ L discovers a process model based on an eventlog, i.e. γ L : U L → U M . The challenge is to design γ L in such way that γ L ( L ) isan appropriate representation of the underlying process P . Appropriateness of γ ( L ) depends on the aim of the process discovery analysis, e.g. ensuring that allbehaviour in the event log is present in the model versus ensuring that the mostfrequent behaviour is present. Given the different aims of process discoveryanalyses, several quality measures are defined in order to judge their resultingmodel’s appropriateness. Ideally, P is used as a basis to compute these metrics,however, as L is the only tangible sample of P , we typically compute the qualityof γ L ( L ) using L . The four essential process mining quality dimensions are replayfitness , precision , simplicity and generalization [2, 12]. Replay fitness describeswhat fraction of the behaviour present in L is also described by γ L ( L ). Precisiondescribes what fraction of the behaviour described by γ L ( L ) is also presentin L . Simplicity describes the (perceived) complexity of the process model.Since it is unlikely that the event log contains all behaviour (incompleteness),generalization describes how well the process model generalizes for behaviournot present in L . Due to noise, an algorithm guaranteeing perfect replay fitness,i.e. all behaviour in the event log is present in the discovered model, capturesbehaviour that is not part of the process. In practice this leads to very complexmodels that are impossible to be interpreted by a human analyst. Hence, aprocess discovery algorithm needs to strike an adequate balance between thefour essential quality dimensions. 6 ∞ (3 , a ) , (4 , c ) , (5 , c ) , (5 , r ) , (5 , r )(6 , c ) , (4 , c ) , (5 , a ) , · · · Figure 3: Example event stream S . Existing process discovery techniques discover process models in an a-posteriorifashion, i.e. they provide a historical view of the data. However, most infor-mation systems allow us to capture the execution of activities at the momentthey occur. Discovering and analysing process models from such continuousstreams of events allows us to get a real-time view of the process under study.Such view paves the way for new types of process mining analysis, i.e. we areable to answer more advanced questions such as “What is the current status ofthe process?” and “What running cases are likely to cause problems?”. It alsoallows us to inspect and visualize recent behaviour and evolution of behaviourin the process, i.e. concept drift.There are several other advantages of studying streams of events rather thanevent logs. Trends such as
Big Data and
Data Science signify the spectaculargrowth and omnipresence of data. Typically, real event logs do not fit mainmemory. Since we assume event streams to be potentially infinite, analysingthem enables us to handle event data of arbitrary size. In other cases we do nothave the time or are not allowed to access event data continuously and, hence,need to analyse events at the moment they occur.In this section we formalize event streams and event stream based processdiscovery . Additionally we quantify high-level requirements for the design ofevent stream based process discovery algorithms.
An event stream is a continuous stream of events executed in context of anunderlying business process. We represent an event stream as a sequence ofpairs consisting of a case-identifier and an activity . Hence, for each event weknow what activity was performed in context of what process instance. Whencomparing event streams to event logs, we identify two main differences: an event stream is potentially infinite and behaviour seen for a case is incomplete , i.e. in the future new events may be executed in context of a case.
Definition 1 (Event stream)
Let A be a set of activities and let C denotethe set of all possible case identifiers. An event stream S is a sequence over C × A , i.e. S ∈ ( C × A ) ∗ . A pair ( c, a ) ∈ C × A represents an event, i.e. activity a was executed incontext of case c . S (1) denotes the first event that we receive, whereas S (i) denotes the i th event. Consider stream S in Fig. 3 as an example, where, event(3 , a ) is emitted first ( S (1) = (3 , a )), event (4 , c ) is emitted second and event(5 , a ) is the eight and last event emitted onto the stream up until now . We7eceive multiple events related to the same case at different points in time, e.g.the second and seventh event on S are related to case 4. Hence, handling suchtype of data needs new types of data structures and event processing techniquescompared to conventional process discovery. The goal of event stream based process discovery is to discover a process modelusing an event stream as an input. A first step is to approximate, based on S ,the presence of some σ ∈ P and possibly σ ’s frequency w.r.t. S . Given suchapproximation the next step is to deploy a process discovery algorithm onto theapproximation in order to obtain a process model.A naive approach is to construct an event log based on the event streamby using a data structure that stores case-sequence pairs ( c, σ ) ∈ C × A ∗ . Forevery event ( c, a ) we receive, we check whether the data structure contains entry( c, σ ′ ). If so, we update this entry to ( c, σ ′ · h a i ). If not, we insert new entry ( c, h a i ). Whenever we want to discover a new process model based on the currentstate of the event stream, we transform the data structure into an event logand provide it to any conventional process discovery algorithm. Observe that,since the stream is potentially infinite, this procedure needs infinite memory .Moreover, the approach includes redundancy , i.e. several (partial) traces thatwhere already analysed in a previous call to a discovery algorithm, and arestill in memory at the next call, are analysed twice. Hence, we want the datastructure to either represent, or be easily translatable to, some minimal form ofdata needed in order to discover a process model.An example of an algorithm using a minimal data representation is the flower-miner . The flower-miner produces a process model that allows for everypossible sequence over the observed activities. Reconsider example stream S (Fig. 3) which consists of activities labelled a , a , c , c , r and r . In Fig. 4 wedepict a flower model, in terms of a Petri net [36], that allows for all activitieson S . pa a c c r r Figure 4: Example “flower”model. To ensure that the flower miner uses fi-nite memory, we just need to deploy anyfinite memory based data structure thatkeeps track of the activities seen on thestream. A wide variety of such data struc-tures exits, e.g. count-based frequent itemdata structures [16], reservoirs [6, 43] andtime-decay based models [17]. Wheneverwe receive a new event ( c, a ) we just add a to the data structure. Translating the datastructure to a process model is trivial, i.e.every activity present in the data-structureis adopted in the flower model.The flower miner works, yet it has deficiencies from a process discoveryperspective. It generalizes the behaviour represented by the event stream as8uch as possible. The resulting process model very likely allows for much morebehaviour than actually present in the underlying process. Hence, we needtechniques that are more precise .The event log based approach and the flower miner represent two extremes.Storing the event stream as an event log requires us to reuse a large part ofthe data several times. The flower miner on the other hand neglects a largequantity of information carried by the event stream and greatly over-generalizesthe stream’s behaviour. We therefore need a scheme that is in the middle ofboth extremes, i.e. it does not store the complete event log, yet it stores enoughdata to provide meaningful output. When analysing conventional process discovery algorithms, we observe that amajority shares a common underlying algorithmic mechanism. The event logis transformed into an abstract representation , which is subsequently used toconstruct a resulting process model. Moreover, several algorithms use the sameabstract representation. In 1 we illustrate the directly follows abstraction , usedby the α -Miner [5]. Example 1 (The directly follows abstraction & the α -Miner) Considerevent log L = [ h a, b, c, d i , h a, c, b, d i ] . The α -Miner computes a directly followsabstraction based on the event log. Activity a is directly followed by b , writtenas a > b , if there exists some sequence σ ∈ L of the form σ = σ ′ · h a, b i · σ ′′ . Incase of event log L we deduce a > b , a > c , b > c , b > d , c > b , c > d . Usingthese relations as a basis, the α -Miner constructs a Petri net. As 1 shows, the event log is translated into a directly follows abstraction ,which is subsequently used to construct a process model. Other discovery al-gorithms like the Inductive Miner [30] and the ILP Miner [48] use the samemechanism to discover a process model. To adopt these algorithms to an eventstream context, it suffices to determine whether we are able to learn the corre-sponding abstract representation from the event stream and, if possible, designa data structure that supports this.In the remainder of this section we formalize the notion of abstract represen-tations. Subsequently we introduce the
Stream-Based Abstract Representation (S-BAR) architecture that captures the notion of event stream based abstractrepresentation computation in a generic manner.
We refine conventional process discovery by splitting γ L into two steps. In thefirst step, the event log is translated into the abstraction used by the discoveryalgorithm. In the second step, the abstraction is translated into a process model.In the remainder we let T denote an abstract representation type. A T denotes9 vent Log L ∈ U L λ A T L c r · · · r a · · · c · · · Directly Follows Abstraction γ A T Process Model
Figure 5: The α -Miner in terms of its abstract representation.an abstract representation of type T and U A T denotes the universe of abstractrepresentations of type T . Definition 2 (Abstraction Function - Event Log)
Let T denote an abstractrepresentation type. An abstraction function λ A T L is a function mapping an eventlog to an abstract representation of type T . λ A T L : U L → U A T (1)Using 2, we define process discovery in terms of abstract representations. Definition 3 (Process Discovery Algorithm - Abstract Representation)
Let T denote an abstract representation type. An abstract representation basedprocess discovery algorithm γ A T maps an abstract representation of type T to aprocess model. γ A T : U A T → U M (2)Every discovery algorithm that uses an abstract representation internallycan be expressed as a composition of λ A T L and γ A T . Thus, given event log L ∈ U L and abstract representation type T , we obtain γ L = ( γ A T ◦ λ A T L )( L ).For example, consider Fig. 5 depicting the α -Miner in terms of γ A T and λ A T L . In this section we present the S-BAR architecture which captures the use ofabstract representations in an event stream context in a generic manner. InFig. 6 the S-BAR architecture is depicted schematically. The S-BAR architec-ture conceptually splits event-stream-based process discovery into three compo-nents, highlighted in gray in Fig. 6. We explain the purpose of each component,i.e. δ D T , λ A T D T and γ A T , by means of an example.Consider maintaining the directly follows abstraction, introduced in 1, ona stream. To do this, we need a data structure that tracks the most recentactivity for each case. Given such data structure, if we receive new event ( c, a ),we check whether we already received an activity a ′ for case c or whether a isthe first activity received for case c . If we already received activity a ′ for case c , we deduce a ′ > a . Subsequently we update our data structure such that itnow assigns a to be the last activity received for case c .10 S (i) D T δ D T c , a ) D T A T M c , a ) D T λ A T D T A T M . . .. . .. . .. . .. . . n ( c n , a n ) D T n A T n M n γ A T . . .. . .. . .. . .. . . Figure 6: Detailed overview of the S-BAR architecture.The first component, i.e. δ D T , maintains and updates a (collection of) datastructure(s) that together form a sufficient representation of the behaviour en-tailed by the event stream. In context of our example, the first component ismainly concerned with keeping track of pairs of activities that are in a a ′ > a relation. The second component, i.e. λ A T D T , translates the data structure to anabstract representation. In context of our example, this consists of translatingthe pairs of activities that are in a a ′ > a relation into the directly follows ab-straction. The third component, i.e. γ A T , translates the abstract representationto a process model and is inherited from conventional process discovery.In the remainder, given an arbitrary data structure type T , we let U D T denote the universe of data structures of type T . A data type T might referto an array or a (collection of) hash table(s), yet it might also refer to someimplementation of a stream-based frequent-item approximation algorithm suchas Lossy Counting [33]. We assume any D T ∈ U D T to use finite memory. Definition 4 (Data Structure Update Function)
Let A be a set of activ-ities and let C denote the set of all possible case identifiers. We define a datastructure update function δ D T as: δ D T : U D T × C × A → U D T (3)The data structure update function δ D T allows us to update a given datastructure D T ∈ U D T based on any newly arrived event. In practice the functiontypically consists of two components. One component keeps track of the casesthat were already active before and maps them in some way to a second (col-lection of) data structure(s). Such second component allows us to construct theabstract representation. Thus, when abstracting this mechanism, given someevent stream based data structure, we need a mechanism to translate the datastructure, i.e. the range of δ D T , to an abstract representation.11 efinition 5 (Abstraction Function - Data Structure) An abstraction func-tion λ A T D T is a function mapping a data structure of type T to an abstract repre-sentation of type T . λ A T D T : U D T → U A T (4)Ideally, translating the data structure is computationally inexpensive. However,in some cases translating the data structure to the intended abstract represen-tation might be expensive. This is acceptable, as long as we (re)-compute theabstraction in a periodic fashion or at the user’s request.Assume that we have seen i ≥ S and let D T i ∈U D T denote the data structure that approximates the behaviour in the eventstream S after receiving i events. When new event ( c, a ) ∈ C × A arrives, we areable to discover a new process model M i +1 by applying ( γ A T ◦ λ A T D T ◦ δ D T )( D T i , c,a ). In practice, δ D T is applied continuously and whenever, after receiving a new i th event, we are interested in finding a process model we apply ( γ A T ◦ λ A T D T )( D T i )to obtain the process model.The main challenge in instantiating the framework is designing a data struc-ture D T ∈ U D T that allows us to approximate an abstract representation to-gether with accompanying δ D T and λ A T D T functions. In this section, we show the applicability of the S-BAR framework by presentingseveral instantiations for different existing process discovery algorithms. A largeclass of algorithms, e.g. the α -Miner [5], the Heuristics Miner [45, 47] and theInductive Miner [30], is based on the directly follows abstraction.Therefore, wefirst present how to compute this abstraction. Subsequently we highlight, foreach algorithm using the directly follows abstraction as a basis, the main changesand/or extensions that need to be applied w.r.t. the basic scheme. To illustratethe generality of the architecture, we also show a completely different class ofdiscovery approaches, i.e. region-based techniques [4,48]. These techniques workfundamentally different compared to the aforementioned class of algorithms anduse different abstract representations. The directly follows abstraction describes pairs of activities ( a, b ), written as a > b , if there exists some sequence σ ∈ L of the form σ = σ ′ · h a, b i · σ ′′ .To approximate the relation, we let data structure D T ∈ U D T consist of twointernal data structures D C and D A . Within D C we store (case,activity)-pairs,i.e. ( c, a ) ∈ C × A , that represent the last activity a seen for case c . Within D A we store (activity, activity)-pairs ( a, a ′ ) ∈ A × A , where ( a, a ′ ) ∈ D A ⇔ a > a ′ .The basic scheme works as follows. When a new event ( c, a ) arrives, we checkwhether D C already contains some pair ( c, a ′ ). If so, we add ( a ′ , a ) to D A ,remove ( c, a ′ ) from D C and add ( c, a ) to D C . If not, we just add ( c, a ) to D C .12 lgorithm 1: D C (Space Saving) input : k ∈ N , S ∈ ( C × A ) ∗ , D A begin X ← ∅ ; i ← while true do i ← i + 1; ( c, a ) ← S (i) ; if ∃ ( c ′ ,a ′ ) ∈ X ( c ′ = c ) then v c ← v c + 1; D A ⊎ { ( a ′ , a ) } ; X ← ( X ∪{ ( c, a ) } ) \{ ( c, a ′ ) } ; else if | X | < k then X ← X ∪ { ( c, a ) } ; v c ← else ( c ′ , a ′ ) ← arg min ( c ′ ,a ′ ) ∈ X ( v c ′ ); v c ← v c ′ + 1; X ← ( X ∪ { ( c, a ) } ) \ { ( c ′ ,a ′ ) } ; Algorithm 2: D C (Lossy) input : k ∈ N , S ∈ ( C × A ) ∗ , D A begin i, ∆ ← X ← ∅ ; while true do i ← i + 1; ( c, a ) ← S (i) ; if ∃ ( c ′ ,a ′ ) ∈ X ( c ′ = c ) then v c ← v c + 1; D A ⊎ { ( a ′ , a ) } ; X ← ( X ∪{ ( c, a ) } ) \{ ( c, a ′ ) } ; else X ← X ∪ { ( c, a ) } ; v c ← ∆; if ⌊ i/k ⌋ 6 = ∆ then foreach ( c ′ , a ′ ) ∈ X do if v c ′ ≤ ∆ then X ← X \ ( c ′ , a ′ ); ∆ ← ⌊ i/k ⌋ ; D A represents the directly follows abstraction by means of a collection of pairs,thus, function λ A T D T consists of translating D A to the appropriate underlyingdata type used by the discovery algorithm of choice.As an example consider Algorithm 1 and Algorithm 2 describing a design of D C based on the SpaceSaving algorithm [35] and Lossy Counting [33] respec-tively. Both algorithms have three inputs, i.e. a maximum size k ∈ N , an eventstream S ∈ ( C × A ) ∗ and a finite memory data structure implementing D A . Thealgorithms maintain a set of (case,activity)-pairs X , initialized to ∅ (line 1). Foreach case c present in X an associated counter v c is maintained which is used formemory management. When a new event ( c, a ) appears on the event stream,the algorithms check whether some pair ( c ′ , a ′ ) s.t. c = c ′ is stored in X (line 5).If this is the case, c ’s counter is increased, ( a ′ , a ) is added to data structure D A and ( c, a ′ ) is replaced by ( c, a ) in X (lines 6-8). The algorithms differ in the waythey process events ( c, a ) for which ∄ ( c ′ ,a ′ ) ∈ X ( c ′ = c ). The Space Saving basedalgorithm (Algorithm 1) either adds the element to X if | X | < k or replacespair ( c ′ , a ′ ) ∈ X with the lowest corresponding counter ( v c ′ ) value (Algorithm 1,lines 9-15). The Lossy Counting based algorithm cleans up its X -set after eachblock of k consecutive events and removes all those entries that have a countervalue lower than variable ∆ (lines 9-16).Both algorithms insert a new element in data structure D A in line 7. Con-ceptually, the algorithms generate a stream of (activity, activity)-pairs. Hence,in Algorithm 3 we present a basic design for D A based on the Frequent Algo-rithm [18, 28] which uses an activity pair stream S A ∈ ( A × A ) ∗ as an input.Thus, D A ⊎{ ( a ′ , a ) } in line 7 of Algorithms 1 and 2 represents adding pair ( a, a ′ )at the end of stream S A .The algorithm stores pairs of activities in its internal set X . Whenever a13ew pair ( a, a ′ ) arrives, the algorithm checks if it is already present in X , if so,it updates the corresponding counter v ( a,a ′ ) . If the pair is not yet present in X ,the size of X is evaluated. If | X | < k the new pair is added to X and a newcounter is created for the pair. If | X | ≥ k the new pair is not added, moreover,each counter is decreased by one and if a counter gets value 0 the correspondingpair is removed. Algorithm 3: D A (Frequent) input : k ∈ N , S A ∈ ( A × A ) ∗ begin X ← ∅ , i ← while true do i ← i + 1; ( a, a ′ ) ← S A (i) ; if ( a, a ′ ) ∈ X then v ( a,a ′ ) ← v ( a,a ′ ) + 1; else if | X | < k then X ← X ∪ { ( a, a ′ ) } ; v ( a,a ′ ) ← else foreach ( x, y ) ∈ X do v ( x,y ) ← v ( x,y ) − if v ( x,y ) = 0 then X ← X \ { ( x, y ) } ; The general mechanism of Algorithm 3is very similar to Algorithm 1. Themain difference consists of how to up-date X when | X | ≥ k . All three algo-rithms use a parameter k which, in away, represents the (maximum) sizeof X . Hence, when we write |D C | , |D A | respectively, we implicitly referto the value of k . It should be clearthat we are also able to implement D C based on the Frequent Algorithm,i.e. we just adopt a different updat-ing mechanism for X . Likewise weare also able to design D A based onthe Space Saving/Lossy Counting al-gorithm. Moreover, for D C we areable to use other types of stream-aware data structures, i.e. techniquesadopting a different scheme to ensure finite memory. Examples of such typesof techniques are Reservoir Sampling [6], and/or Decay Based Schemes [17].In the next sections we briefly explain how the α -Miner, Heuristics Miner andInductive Miner use the directly follows abstraction and what changes to thebase scheme must be applied in order to adopt them in a streaming setting. α -Miner The α -Miner [5] transforms the directly follows abstraction into a Petri net.When adopting the α -Miner to an event stream context, we directly adopt thescheme described in the previous section. However, the algorithm explicitlyneeds a set of start- and end activities .Approximating the start activities seems rather simple, i.e. whenever wereceive a new case, the corresponding activity represents a start activity. How-ever, given that we at some point remove (case,activity)-pairs from D C , we mightdesignate some activities falsely as start activities, i.e. a new case may in factrefer to a previously removed case. Approximating the end activities is morecomplex, as we are often not aware when a case terminates. A potential solutionis to apply a warm-up period in which we try to observe cases that seem to beterminated, e.g. by identifying cases that have long periods of inactivity or byassuming that cases that are dropped out of D C are terminated. However, sincewe approximate case termination, using this approach may lead to falsely select14ertain activities as end activities.We can also deduce start- and end activities from the directly follows ab-straction. A start activity is an a ∈ A with ∄ a ′ ∈ A ( a ′ = a | a ′ > a ) and an endactivity is an a ∈ A with ∄ a ′ ∈ A ( a ′ = a | a > a ′ ). This works if these activitiesare only executed once at the beginning, respectively the end, of the process. Incase of loops or multiple executions of start/end activities within the process,we potentially falsely neglect certain activities as being either start and/or endactivities. In Section 8.2, we discuss this problem in depth. The Heuristics Miner [45,46,47] is designed to cope with noise in event logs. Todo this, it effectively counts the number of occurrences of activities, as well asthe > -relation. Based on the directly follows abstraction it computes a derivedmetric a ⇒ b = | a>b |−| b>a || a>b | + | b>a | +1 that describes the relative causality between twotasks a and b ( | a > b | denotes the number of occurrences of a > b ). The basicscheme presented in Section 5.1 suffices for computing a ⇒ b , as long as D A explicitly tracks, or, approximates, the frequencies of its elements (in the schemethis is achieved by the internal counters). The Inductive Miner [30], like the α -Miner, uses the directly follows abstractionand start and end activities. It tries to find patterns within the directly fol-lows abstraction that indicate certain behaviour, e.g. parallelism. Using thesepatterns it splits the event log into several smaller logs and repeats the proce-dure. Due to its iterative nature, the Inductive Miner guarantees to find soundworkflow nets [1]. The Inductive Miner has also been extended to handle noiseand/or infrequent behaviour [29]. This requires, like the Heuristics Miner, tocount the > -relation. In [31], a version of the Inductive Miner is presented inwhich the inductive steps are directly performed on the directly follows abstrac-tion. In context of event streams this is the most adequate version to use as weonly need to maintain a (counted) directly follows abstraction. Several process discovery algorithms [4, 9, 15, 48, 50] are based on region theory which solve the Petri net synthesis problem [8]. Classical region theory tech-niques ensure strict formal properties for the resulting process models. Processdiscovery algorithms based on region theory relax these properties. We iden-tify two different region theory approaches, i.e. language-based and state-based region theory, which use different forms of abstract representations.
Algorithms based on language-based region theory [9,48] rely on a prefix-closure of the input event log, i.e. the set of all prefixes of all traces. It is trivial to adapt15he scheme presented to compute the directly follows abstraction (Section 5.1)to prefix-closures. In stead of storing (case,activity)-pairs in D C , we store pairs( c, σ ) ∈ C × A ∗ . We additionally use a data structure D pc which approximatesthe prefix-closure. Whenever we receive an event ( c, a ), we look for a pair ( c,σ ) ∈ D C . If such pair exist we subsequently add σ ′ = σ · h a i to D pc and update( c, σ ) to ( c, σ ′ ). If there is no such pair ( c, σ ), we add ǫ and h a i to D pc and( c, h a i ) to D C . In case of [48], which uses Integer Linear Programming where(an abstraction of) the prefix-closure forms the constraint body, we simply storethe constraints in D pc , rather than the prefix-closure. Within process discovery based on state-based regions [4], a transition systemis constructed based on a view of a trace. Examples of a view are the completeprefix of the trace, the multiset projection of the prefix, etc. The future of atrace can be used as well, i.e. given an event within a trace, the future of theevent are all events happening after the event. However, future-based views arenot applicable in an event stream setting, as the future is unknown.As an example of a transition system based on a simple event log L = [ h a, b,c, d i , h a, c, b, d i ], consider Fig. 7. In Fig. 7a states are represented by a multisetview of the prefixes of the traces, i.e. the state is determined by the multisetof activities seen before. Activities make up the transitions within the system,i.e. the first activity in both traces is a , thus the empty multiset is connectedto multiset [ a ] by means of a transition labelled a . In Fig. 7a we do not limitthe maximum size of the multisets. Fig. 7b shows a set view of the traces witha maximum set size of 1. Again the empty set is connected with set { a } bymeans of a transition labelled a . For trace h a, b, c, d i for example, the secondactivity is a b and thus state { a } has an outgoing transition labelled b to state { b } . This is the case, i.e. a connection to state { b } rather than { a, b } , due tothe size restriction of size 1.Consider the following scheme, similar to the scheme presented in Section 5.1. s : [ ] s : [ a ] s : [ a, b ] s : [ a, c ] s : [ a, b, c ] s : [ a, b, c, d ] a bc cb d (a) Multiset abstraction (unbounded). s : ∅ s : { a } s : { b } s : { c } s : { d } a bc cb dd (b) Set abstraction (max. set size 1). Figure 7: Example transition systems based on L = [ h a, b, c, d i , h a, c, b, d i ].16iven a view type V , e.g. a set view, we design D C to maintain pairs ( c, v c ), s.t. v c is the last view constructed for case c . Moreover, we maintain a collection ofviews D V . Updating D V is straightforward. Given new event ( c, a ), based on v c we compute some new view v ′ c , add it to D V and update ( c, v c ) to ( c, v ′ c ) in D C , e.g. updating the size-1 set view means that the new view based on newevent ( c, a ) is simply the set { a } . However, just maintaining size-1 sets in D V does not suffice as the relations between those sets, i.e. the transitions in thetransition system, are not present in D V .The problem is fixed by maintaining the transition system in memory, ratherthan D V , and updating it directly when we receive new events. Given somelatest view v c for case c , i.e. ( c, v c ) ∈ D C , activity a of new event ( c, a ) representsthe transition from v c to the newly derived v ′ c . Without a limit on the view-size, translating the transition system into a Petri net is rather slow. Hence,in a streaming setting we limit the maximum size of the views. This, in turn,causes some challenges w.r.t. D C and translation function λ A T D T . Consider thecase where we maintain a multiset/set view of traces with some arbitrary finitecapacity k . Moreover, given k = 2, assume we receive event ( c, a ) and ( c, { a ′ ,a ′′ } ) ∈ D C . The question is whether the new view for c is { a, a ′ } or { a, a ′′ } ?Only if we store the last two events observed for c , in order, we are able toanswer this question, i.e. if ( c, h a ′ , a ′′ i ) ∈ D C we deduce the new view to be { a, a ′′ } . Finally note, that when we aim at removing paths from the transitionssystem, for example when we remove cases from c from D C , we need to store thewhole trace for c in order to be able to reduce all states and transitions relatedto case c . In this section we present an evaluation of several instantiations of the archi-tecture. We also consider performance aspects of the implementation. All fivealgorithms, i.e. α -Miner, Heuristics Miner, Inductive Miner, ILP (languagebased regions) and Transition System Miner (state based regions), have beenimplemented using the schemes presented in Section 5 in the ProM [19] frame-work ( ). ProM is the de facto standard academictool-kit for process mining algorithms and is additionally used by practition-ers in the field. Some of the implementations are ported to RapidProM [3]( ), i.e. a plugin of RapidMiner ( ),which allows for designing large-scale repetitive experiments by means of sci-entific workflows [10]. Source code of the implementations is available via the Stream -related packages within the ProM code base, i.e.
StreamAbstractRepre-sentation , StreamAlphaMiner , StreamHeuristicsMiner , StreamILPMiner , Strea-mInductiveMiner and
StreamTransitionSystemsMiner (code for a package Xis located at http://svn.win.tue.nl/repos/prom/Packages/X ). Experimentresults, event streams and generating process models used, are available at https://github.com/s-j-v-zelst/research/releases/download/kais1/2016_kais1_experiments.tar.g
As a first visual experiment we investigate the steady-state behaviour of the
Inductive Miner [30]. For both D C and D A we use the Lossy Counting scheme(Section 5.1). To create an event stream, we created a timed Coloured PetriNet [26] in CPN-Tools [27] which simulates the BPMN model depicted in Fig. 2and emits the corresponding events. The event stream, and all other eventstreams used for experiments, are free of noise. The model is able to simu-late multiple cases being executed simultaneously. The ProM streaming frame-work [49, 51] is used to generate an event stream out of the process model.In Fig. 8 we show the behaviour of the Inductive Miner over time, configuredwith |D C | = 75, |D A | = 75, based on a random simulation of the CPN model.Initially (Model 1) the Inductive Miner only observes a few directly follows re-lations, all executed in sequence. After a while (Model 2) the Inductive Minerobserves that there is a choice between Prepare acceptance pack and
Reject Ap-plication . In Model 3, the first signs of parallel behaviour of activities
Appraiseproperty , Check credit history and
Assess loan risk become apparent. However,not enough behaviour is emitted onto the stream to effectively observe the par-allel behaviour yet. In Model 4, we identify a large block of activities within a18hoice construct. Moreover, an invisible transition loops back into this block.The Inductive Miner tends to show this type of behaviour given an incompletedirectly follows abstraction. Finally, after enough behaviour is emitted onto thestream, Model 5 shows a Petri net version of example process model of Fig. 2.Fig. 8 shows that the Inductive Miner is able to find the original modelbased on the event stream. We now focus on comparing the Inductive Minerwith other algorithms described in the paper. All discovery techniques discovera Petri net or some alternative process model that we can convert to a Petri net.The techniques however differ in terms of guarantees w.r.t. the resulting processmodel. The Inductive Miner guarantees that the resulting Petri nets are sound ,whereas the ILP Miner and the Transition System Miner do not necessarily yieldsound process models. To perform a proper behavioural comparative analysis,the soundness property is often a prerequisite. Hence, we perform a structuralanalysis of all the algorithms by measuring structural properties of the resultingPetri nets.Using the off-line variant of each algorithm we first compute a reference Petrinet. We generated an event log L which contains enough behaviour such that thediscovered Petri nets describe all behaviour of the BPMN model of Fig. 2. Basedon the reference Petri net we create a 15-by-15 matrix in which each row/columncorresponds to an activity in the BPMN model. If, in the Petri net, two labelledtransitions are connected by means of a place, the corresponding cells in thematrix get value 1. For example, given the first Petri net of Fig. 8, the labels start and Check application completeness (in the figure this is “Check appl”)are connected by means of a place. Hence, the distance between the two labelsis set to 1 in the corresponding matrix. If two transitions are not connected,the corresponding value is set to 0.Using an event stream based on the CPN-Model, after each newly receivedevent, we use each algorithm to discover a Petri net. For each Petri net weconstruct the 15-by-15 matrix. We apply the same procedure as applied onthe reference model. However, if in a discovered Petri net a certain label isnot present, we set all cells in the corresponding row/column to −
1, e.g. inmodel 1 of Fig. 8 there is no transition labelled end , thus the corresponding rowand column consist of − M based on the streamingvariant of an algorithm, we compute the distance to the reference matrix M R as: d M,M R = qP i,j ∈{ , ,..., } (( M ( i, j ) − M R ( i, j )) . For all algorithms, theinternal data structures used where based on Lossy Counting, with size 100.Since the Inductive Miner and the α -Miner are completely based on thesame abstraction, we expect them to behave similar. Hence, we plot theircorresponding results together in Fig. 9a. Interestingly, the distance metricfollows the same pattern for both algorithms. Initially, there is a steep declinein the distance metric after which it becomes zero. This means that the referencematrix equals the matrix based on the discovered Petri net. The distance showssome peaks in the area between 400 until 1000 received events. Analyzingthe resulting Petri nets at these points in time showed that some activitieswhere not present in the resulting Petri nets at those points. The results for19
500 1000 1500 2000 2500
Events D i s t a n ce AlphaInductive (a) Distances for α and IM Events D i s t a n ce TSILPHeuristics (b) Distances for TS, ILP and HM.
Figure 9: Distance measurements for the α -Miner, Inductive Miner (IM), ILPMiner (ILP), Transition Systems Miner (TS) and Heuristics Miner.the Transition Systems Miner (TS), the ILP Miner and the Heuristics Minerare depicted in Fig. 9b. We observe that the algorithms behave similar to the α - and Inductive Miner, which intuitively makes sense as the algorithms allhave the same data structure capacity. However, the peeks in the distancemetric occur at different locations. For the Heuristics Miner this is explainedby the fact that it takes frequency into account and thus uses the directlyfollows abstraction differently. The Transition System Miner and the ILP Mineruse different abstract representations, and have a different update mechanismthan the directly follows abstraction, i.e. they always update their abstractionwhereas the directly follows abstraction only updates if, for a given case, wealready received a preceding activity. Although the previous experiments provide interesting insights w.r.t. the func-tioning of the algorithms in a streaming setting, they only consider structuralmodel quality. A distance value of 0 in Fig. 9 indicates that the resulting modelis very similar to the reference model. It does not guarantee that the model is infact equal, or, entails the same behaviour as the reference model. Hence, in thissection we focus on measuring quantifiable similarity in terms of behaviour . Weuse the Inductive Miner as it provides formal guarantees w.r.t. initialization andtermination of the resulting process models. This in particular is a requirementto measure behavioural similarity in a reliable manner. We adapt the InductiveMiner to a streaming setting by instantiating the S-BAR framework, using thescheme described in Section 5.1, combined with the modifications described inSection 5.1.3. For finding start and end activities we traverse the directly fol-lows abstraction and select activities that have no predecessor, or, successor,respectively. We again use Lossy Counting [33] to implement both D C and D A (Algorithm 2, Section 5.1).We assess under what conditions the Inductive Miner instantiation is ableto discover a process model with the same behaviour as the BPMN model inFig. 2. In the experiment, after each received event, we query the miner for20 . . . Events R e p l a y F i t n e ss (a) |D C | = 25, |D A | = 25 . . . Events R e p l a y F i t n e ss (b) |D C | = 75, |D A | = 75 . . . Events P r ec i s i o n (c) |D C | = 25, |D A | = 25 . . . Events P r ec i s i o n (d) |D C | = 75, |D A | = 75 Figure 10: Replay fitness and Precision measures based on applying the StreamInductive Miner: Increasing memory helps to improve fitness and precision.its current result and compute replay fitness and precision measures based ona complete corresponding event log. In Fig. 10 the results are presented forvarying capacity sizes of the underlying data structure (Lossy Counting).For the smallest data structure sizes, i.e. Fig. 10a, we identify that the replayfitness does not stabilize. When the data structure size increases, i.e. Fig. 10b,we identify the replay fitness to reach a value of 1 rapidly. The high variabilityin the precision measurements present in Fig. 10c suggests that the algorithm isnot capable of storing the complete directly follows abstraction. As a result, theInductive Miner tends to create flower-like patterns, thus greatly under-fittingthe actual process. The stable pattern present in Fig. 10d suggest that the sizesused within the experiment are sufficient to store the complete directly followsabstraction. Given that the generating process model is within the class of re-discoverable process models of the Inductive Miner, both a replay fitness anda precision value of 1 indicates that the model is completely discovered by thealgorithm.In the previous experimental setting, we chose to use the same capacity forboth D C and D A . Here we study the influence of the individual sizes of D C and D A . In Fig. 11 we depict the results of two different experiments in which wefixed the size of one of the two data structures and varied the size of the otherdata structure. Fig. 11a depicts the results for a fixed value |D C | = 100 andvarying sizes |D A | = 10 , , ...,
50. Fig. 11b depicts the results for a fixed value |D A | = 100 and varying sizes |D C | = 10 , , ...,
50. As the results show, the lackof conversion to a replay fitness value of 1 mostly depends on the size of D A and21 − . . . Events R e p l a y F i t n e ss (a) |D C | = 100, |D A | = 10 , ..., . . . Events R e p l a y F i t n e ss (b) |D C | = 10 , , ..., |D A | = 100 Figure 11: Replay Fitness measures for the Stream Inductive Miner.is relatively independent of the size of D C . Intuitively this makes sense as weonly need one entry ( c, a ) ∈ D C to deduce a > b , given that the newly receivedevent is ( c, b ). Even if case c is dropped at some point in time, and reinsertedlater, still information regarding the directly follows abstraction can be deduced.However, if not enough space is reserved for the D A data structure, then thedata structure is incapable of storing the complete directly follows abstraction. In the previous experiments we focused on a process model that describes ob-served steady state behaviour, i.e. the process model from which events aresampled does not change during the experiments. In this section we assess towhat extend the Inductive Miner based instantiation of the framework is ableto handle concept drift [11, 42]. We focus on gradual drift , i.e. the behaviourof the process model changes at some point in time, though the change is onlyapplicable for new cases, already active cases follow the old behaviour. In orderto obtain a gradual drift, we manipulated the CPN simulation model of the pro-cess model presented in Fig. 2. The first five hundred cases that are simulatedfollow the original model. All later cases are routed to a model in which weswap the parallel and choice structures within the model (Fig. 12).Fig. 13 depicts the results of applying the Inductive Miner on the describedgradual drift. In Fig. 13a we depict the results using data structure sizes |D C | =100 and |D A | = 50 (Lossy Counting). The blue solid line depicts the replayfitness w.r.t. an event log containing behaviour prior to the drift, the reddashed line represents replay fitness w.r.t. an event log containing behaviour after the drift. We observe that the algorithm again needs some time to stabilizein terms of behaviour w.r.t. the pre-drift model. Interestingly, at the momentthat the algorithm seems to be stabilized w.r.t. the pre-drift model, the replayfitness w.r.t. the post-drift model fluctuates. This indicates that the algorithmis not able to fully rediscover the pre-drift model, yet it produces a generalizingmodel which includes more behaviour, i.e. even behaviour that is part of thepost-drift model. The first event in the stream related to the new executionof the process, is the 6 . th event. Indeed, the blue solid line drops around22 a) Parallel to choice. (b) Choice to Parallel. Figure 12: Changes made to the business process model presented in Fig. 2. . . . Events R e p l a y F i t n e ss BeforeAfter (a) |D C | = 100, |D A | = 50 . . . Events R e p l a y F i t n e ss BeforeAfter (b) |D C | = 100, |D A | = 100 Figure 13: Replay Fitness measures for the Stream Inductive Miner, given anevent stream containing concept drift.this point in Fig. 13a. Likewise, the red dashed line rapidly increase to value1.0. Finally, around event 15 .
000 the replay fitness w.r.t. the pre-drift modelstabilizes completely, indicating that the prior knowledge related to the pre-driftmodel is completely erased from the underlying data structure. In Fig. 13b wedepict results for the Inductive miner using sizes |D C | = 100 and |D A | = 100. Inthis case we observe more stable behaviour, i.e. both the pre- and post-modelbehaviour stabilizes quickly. Interestingly, due to the use of a bigger k -value ofthe Lossy Counting Algorithm, the drift is reflected longer in the replay fitnessvalues. Only after roughly the 30 . th event the replay fitness w.r.t. the pre-drift model stabilizes. The main goal of the performance evaluation is to assess whether memory usageand processing times of the implementations are acceptable. As the implemen-tations are of a prototypical fashion, we focus on trends in processing time and23
Events N a n o S ec o nd s (a) Processing times in nanoseconds. B y t e s Events (b) Memory usage in bytes.
Figure 14: Performance measurements based on the Stream Inductive Miner.Table 2: Aggregate performance measures for the Stream Inductive Miner.25x25 50x50 75x75
Avg. processing time (ns.):
Stdev. processing time (ns.):
Avg. memory usage (byte):
Stdev. memory usage (byte): k ): |D C | = 25 and |D A | = 25, |D C | = 50 and |D A | = 50 and |D C | = 75, |D A | = 75 (represented in the Figures as 25x25, 50x50 and 75x75 respectively).We measured the time the algorithm needs to update both D C and D A . Thememory measured is the combined size of D C and D A in bytes. The results ofthe experiments are depicted in Fig. 14. Both figures depict the total number ofevents received on the x-axis. In Fig. 14a, the processing time in nanoseconds isshown on the y-axis, whereas in Fig. 14b, the memory usage in bytes is depicted.The aggregates of the experiments are depicted in Table 2.As Fig. 14a shows, there is no observable increase in processing times asmore events have been processed. The average processing time seems to slightlydecrease when the window size of the Lossy Counting data structure increases(see Table 2). Intuitively this makes sense as a bigger window size of the LossyCounting algorithm implies less frequent cleanup operations.Like processing time, memory usage of the Lossy Counting data structuresdoes not show an increasing trend (Fig. 14b). In this case however, memoryusage seems to increase when the window size of the Lossy Counting algorithm24s bigger. Again this makes sense, as less cleanup operations implies more activemembers within the data structures, and hence, a higher memory usage. For a detailed overview of process mining we refer to [2]. For an overview ofmodels, techniques and algorithms in stream based mining and analysis, e.g.frequency approximation algorithms, we refer to [7,23,37]. Little work has beendone on the topic of stream-based process discovery, and, stream-based processmining in general. The notion of streams of events is not new, i.e. several fieldsstudy aspects related to streams of (discrete) events. Compared to the field ofComplex Event Processing (CEP) [22], the S-BAR architecture can be seen asan event consumer , i.e. a decoupled entity that processes the events producedby the underlying system. However, whereas the premise of CEP is towardsthe design of event based systems and architectures, this work focuses on the behavioural analysis of such systems. The area of event mining [32], focuseson gaining knowledge from historical event/log data. Although the input datais similar, i.e. streams of system events, the assumptions on the data sourceare different. Within event mining, data mining techniques such as patternmining [32, Chpt. 4] are used as opposed to techniques used within this paper,i.e. techniques discovering end-to-end process models with associated executionsemantics. Also, event mining includes methods for system monitoring, whereasthe S-BAR architecture can serve as an enabler for business process monitoringand prediction.To the best of the author’s knowledge this paper is the first work thatpresents a generic architecture for the purpose of event stream based processdiscovery. As such the work may be regarded as a generalization and standard-ization effort of some of the related work mentioned within this section.In [14] an event stream based variant of the Heuristics Miner is presented.The algorithm uses three internal data structures using both Lossy Counting [33]and Lossy Counting with Budget [41]. The authors use these structures toapproximate a causal graph based on an event stream. The authors additionallypresent a sliding window based approach. Recently an alternative data structurehas been proposed based on prefix-trees [25]. In this work the authors deducethe directly follows abstraction directly from a prefix tree which is maintained inmemory. The main advantage of using the prefix-trees is the reduced processingtime and usage of memory. In [40], Redlich et al. design an event stream basedvariant of the CCM algorithm [39]. The authors identify the need to computedynamic footprint information based on the event stream, which can be seen asthe abstract representation used by CCM. The dynamic footprint is translatedto a process model using a translation step called
Footprint Interpretation . Theauthors additionally apply an ageing factor to the collected trace informationto fade out the behaviour extracted from older traces. Although the authorsdefine event streams similarly to this paper the evaluation relies heavily on theconcept of completed traces . In [13] Burattin et al. propose an event stream25ased process discovery algorithm to discover declarative process models. Thestructure described to maintain events and their relation to cases is comparablewith the one used in [14]. The authors present several declarative constraintsthat can be updated on the basis of newly arriving events instead of an eventlog consisting of full traces.
In this section we discuss interesting phenomena observed during experimenta-tion which should be taken into account when adopting the architecture pre-sented in this paper, and, in event stream based process discovery in general.We discuss limitations w.r.t. the complexity of abstract representation computa-tion and discuss the impact of the absence of trace initialization and terminationinformation.
There are limitations w.r.t. the algorithms we are able to adopt using abstractrepresentations as basis. This is mainly related to the computation of the ab-stract representation within the conventional algorithm.As an example, consider the α + -algorithm [34] which extends the original α -Miner such that it is able to handle self-loops and length-1-loops. For handlingself-loops, the α + -algorithm traverses the event log and identifies activities thatare within a self-loop. Subsequently it removes these from the log and afterthat calculates the directly follows abstraction. For example, if L = [ h a, b, c i , h a,b, b, c i ], the algorithm will construct L ′ = [ h a, c i ] and compute directly followsmetrics based on L ′ .In a streaming setting we are able to handle this as follows. Whenever weobserve some activity a to be in a self-loop and want to generate the directlyfollows abstraction, then for every ( a ′ , a ) ∈ D A and ( a, a ′′ ) ∈ D A , s.t. a = a ′ and a = a ′′ , we deduce that ( a ′ , a ′′ ) is part of the directly follows abstractionwhereas ( a, a ), ( a ′ , a ) and ( a, a ′′ ) are not. Although this procedure approximatesthe directly follows relation on the event stream, a simple example shows thatthe relation is not always equal. a ce d (a) Event log a ce d (b) Event stream Figure 15: Two Abstract Representa-tions. Imagine a process P = {h a, b, b,c i , h a, e, b, d i} . Clearly any noise-freeevent log over this process is justa multi-set over the two traces in P . In case of the conventional α + -algorithm, removing the b -activityleads to the two traces h a, c i and h a,e, d i . Consider the corresponding di-rectly follows abstraction, depicted inFig. 15a. Observe that all possible di-rectly follows pairs that we are able to26bserve on any stream over P are: ( a, b ) , ( a, e ) , ( b, b ) , ( b, c ) , ( b, d ) , ( e, b ). Apply-ing the described procedure yields the abstraction depicted in Fig. 15b. Dueto the information that is lost by only maintaining directly follows pairs, wededuce non-existing relations ( a, d ) and ( e, c ).In general it is preferable to adopt an abstraction-based algorithm that con-structs the abstract representation in one pass over the event log. For the definitions presented in this paper, we abstract from trace initializationand/or termination, i.e. we do not assume the existence of explicit start/endevents. Apart from the technical challenges related to finding these events, i.e.as described in Section 5.1.1 regarding start/end activity sets used by the α -Miner and Inductive Miner, this can have a severe impact on computing theabstract representation as well.If we assume the existence and knowledge of unique start and end activities,adopting any algorithm to cope with this type of knowledge is trivial. We onlyconsider cases of which we identify a start event and we only remove knowledgerelated to cases of which we have seen the end event. The only challenge is tocope with the need to remove an unfinished case due to memory issues, i.e. howto incorporate this deletion into the data structure/abstract representation thatis approximated.If we do not assume and/or know of the existence of start/end activities,whenever we encounter a case for which our data structure indicates that wehave not seen it before, this case is identified as being a “new case”. Similarly,whenever we decide to drop a case from a data structure, we implicitly assumethat this case has terminated. Clearly, when there is a long period of inactivity,a case might be falsely assumed to be terminated. If the case becomes activeagain, it is treated as a new case again. The experiments reported on in Fig. 11show that in case of the directly follows abstraction, this type of behaviourhas limited impact on the results. However, in a more general sense, e.g. whenapproximating a prefix-closure on an event stream, this type of behaviour mightbe of greater influence w.r.t. resulting model. The ILP Miner likely suffers fromsuch errors and as a result produces models of inferior quality.In fact, for the ILP Miner the concept of termination is of particular im-portance. To guarantee a single final state of a process model, the ILP Minerneeds to be aware of completed traces . This corresponds to explicit knowledgeof when a case is terminated in an event stream setting. Like in the case ofinitialization, the resulting models of the ILP miner are greatly influenced by afaulty assumption on case termination. In this paper, we presented a generic architecture that allows for adopting ex-isting process discovery algorithms in an event stream setting. The architecture27s based on the observation that many existing process discovery algorithmstranslate a given event log into some abstract representation and subsequentlyuse this representation to discover a process model. Thus, in an event streambased setting, it suffices to approximate the abstract representation using theevent stream in order to apply existing process discovery algorithms to streamsof events. The exact behaviour present in the resulting process model greatlydepends on the instantiation of the underlying techniques that approximate theabstract representation.Several instantiations of the architecture have been implemented in the pro-cess mining tool-kits ProM and RapidProM. We primarily focused on abstractrepresentation approximations using algorithms designed for the purpose of fre-quent item mining on data streams. We structurally evaluated and comparedfive different instantiations of the framework. From a behavioural perspectivewe focused on the Inductive Miner as it grantees to produce sound workflownets. The experiments show that the instantiation is able to capture processbehaviour originating from a steady state-based process. Moreover, convergenceof replay fitness to a stable value depends on parametrization of the internal datastructure. In case of concept drift, the size of the internal data structure of useimpacts both model quality and the drift detection point. We additionally stud-ied the performance of the Inductive Miner instantiation. The experiments showthat both processing time of new events and memory usage are non-increasingas more data is received.
Future Work
Within the experiments we chose to limit the use of internaldata structure to the Lossy Counting based approach. However, more instantia-tions, i.e. Frequent / Space Saving, are presented and implemented. We plan toinvestigate the impact of several different designs of the internal data structuresw.r.t. both behaviour and performance.The architecture presented in this work focuses on approximating abstractrepresentations and exploiting existing algorithms to discover a process model.However, bulk of the work might be performed multiple times, i.e. several newevents emitted to the stream might not change the abstract representation. Wetherefore plan to conduct a study towards a completely incremental instantiationof the architecture, i.e. can we immediately identify whether new data changesthe abstraction or even the resulting model?Another interesting direction for future work is to go beyond control-flowdiscovery, i.e. can we lift conformance checking, performance analysis, etc. tothe domain of event streams? Moreover, in such cases we might need to storemore information, i.e. store all attributes related to events within cases seenso far. We plan to investigate the application of lossless/lossy compression ofthe data seen so far, i.e. using frequency distributions of activities/attributesto encode sequences in a compact manner.28 eferences [1] Aalst, W.M.P. van der: The application of petri nets to workflow manage-ment. Journal of Circuits, Systems, and Computers (1), 21–66 (1998).DOI 10.1142/S0218126698000043[2] Aalst, W.M.P. van der: Process Mining - Data Science in Action, SecondEdition. Springer (2016). DOI 10.1007/978-3-662-49851-4[3] Aalst, W.M.P. van der, Bolt, A., Zelst, S.J. van : RapidProM: Mine YourProcesses and Not Just Your Data. CoRR abs/1703.03740 (2017)[4] Aalst, W.M.P. van der, Rubin, V., Verbeek, H.M.W., Dongen, B.F. van,Kindler, E., G¨unther,C.W.: Process Mining: A Two-Step Approach to Bal-ance Between Underfitting and Overfitting. Software and System Modeling (1), 87–111 (2010). DOI 10.1007/s10270-008-0106-z[5] Aalst, W.M.P. van der, Weijters, T., Maruster, L.: Workflow mining: Dis-covering process models from event logs. IEEE Trans. Knowl. Data Eng. (9), 1128–1142 (2004). DOI 10.1109/TKDE.2004.47[6] Aggarwal, C.C.: On Biased Reservoir Sampling in the Presence of StreamEvolution. In: Proceedings of the 32Nd International Conference on VeryLarge Data Bases, VLDB ’06, pp. 607–618. VLDB Endowment (2006)[7] Aggarwal, C.C. (ed.): Data Streams, Advances in Database Systems ,vol. 31. Springer US (2007). DOI 10.1007/978-0-387-47534-9[8] Badouel, E., Bernardinello, L, Darondeau,P: Petri Net Synthesis. Texts inTheoretical Computer Science. An EATCS Series. Springer (2015). DOI10.1007/978-3-662-47967-4[9] Bergenthum, R., Desel, J., Lorenz, R., Mauser, S.: Process Mining Basedon Regions of Languages. In: Business Process Management, 5th Interna-tional Conference, BPM 2007, Brisbane, Australia, September 24-28, 2007,Proceedings, pp. 375–383 (2007). DOI 10.1007/978-3-540-75183-0 \ (1), 154–171 (2014). DOI 10.1109/TNNLS.2013.2278313[12] Buijs, J.C.A.M., Dongen, B.F. van, Aalst, W.M.P. van der : Quality di-mensions in process discovery: The importance of fitness, precision, gener-alization and simplicity. Int. J. Cooperative Inf. Syst. (1) (2014). DOI10.1142/S0218843014400012 2913] Burattin, A., Cimitile, M., Maggi, F.M., Sperduti, A.: Online Discoveryof Declarative Process Models from Event Streams. IEEE Trans. ServicesComputing (6), 833–846 (2015). DOI 10.1109/TSC.2015.2459703[14] Burattin, A., Sperduti, A., Aalst, W.M.P. van der: Control-Flow Discov-ery From Event Streams. In: Proceedings of the IEEE Congress on Evo-lutionary Computation, CEC 2014, Beijing, China, July 6-11, 2014, pp.2420–2427 (2014)[15] Carmona,J., Cortadella, J.: Process Discovery Algorithms Using NumericalAbstract Domains. IEEE Trans. Knowl. Data Eng. (12), 3064–3076(2014). DOI 10.1109/TKDE.2013.156[16] Cormode, G., Hadjieleftheriou, M.: Methods for Finding Frequent Itemsin Data Streams. The VLDB Journal (1), 3–20 (2009). DOI 10.1007/s00778-009-0172-z[17] Cormode, G., Shkapenyuk, V., Srivastava, D., Xu, B.: Forward Decay:A Practical Time Decay Model for Streaming Systems. In: 2009 IEEE25th International Conference on Data Engineering, pp. 138–149 (2009).DOI 10.1109/ICDE.2009.65[18] Demaine, E.D., L´opez-Ortiz, A., Munro, J.I.: Frequency Estimation ofInternet Packet Streams with Limited Space. In: M¨ohring, R.H., Ra-man, R. (ed.) Algorithms - ESA 2002, 10th Annual European Sympo-sium, Rome, Italy, September 17-21, 2002, Proceedings, Lecture Notesin Computer Science , vol. 2461, pp. 348–360. Springer (2002). DOI10.1007/3-540-45749-6 33[19] Dongen, B.F. van, Medeiros, A.K.A. de, Verbeek, H.M.W., Weijters,A.J.M.M., Aalst, W.M.P. van der: The ProM Framework: A New Era inProcess Mining Tool Support. In: Applications and Theory of Petri Nets2005, 26th International Conference, ICATPN 2005, Miami, USA, June20-25, 2005, Proceedings, pp. 444–454 (2005). DOI 10.1007/11494744 \ , 225–242 (2009)[21] Dumas, M., Rosa, M. La, Mendling, J., Reijers, H.A.: Fundamen-tals of Business Process Management. Springer (2013). DOI 10.1007/978-3-642-33143-5[22] Etzion, O., Niblett, P.: Event Processing in Action. Manning PublicationsCompany (2010)[23] Gama, J.: Knowledge Discovery from Data Streams, 1st edn. Chapman &Hall/CRC (2010). DOI 10.1201/EBK14398261193024] G¨unther, C.W., Aalst, W.M.P. van der: Fuzzy Mining - Adaptive ProcessSimplification Based on Multi-perspective Metrics. In: Business ProcessManagement, 5th International Conference, BPM 2007, Brisbane, Aus-tralia, September 24-28, 2007, Proceedings, pp. 328–343 (2007). DOI10.1007/978-3-540-75183-0 \ (3-4),213–254 (2007). DOI 10.1007/s10009-007-0038-x[28] Karp, R.M., Shenker S., Papadimitriou, C.H.: A Simple Algorithm forFinding Frequent Elements in Streams and Bags. ACM Trans. DatabaseSyst. , 51–55 (2003). DOI 10.1145/762471.762473[29] Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process models from event logs containing infrequent behaviour.In: Business Process Management Workshops - BPM 2013 InternationalWorkshops, Beijing, China, August 26, 2013, Revised Papers, pp. 66–78(2013). DOI 10.1007/978-3-319-06257-0 \ \ Lecture Notes in Computer Science , vol. 3272, pp. 151–165. Springer BerlinHeidelberg (2005)[35] Metwally, A., Agrawal, D., Abbadi, A.: Efficient Computation of Fre-quent and Top-k Elements in Data Streams. In: Eiter, T., Libkin, L.(ed.) Proceedings of the 10th International Conference on Database Theory,ICDT’05, pp. 398–412. Springer-Verlag, Berlin, Heidelberg (2005). DOI10.1007/978-3-540-30570-5 \ (4), 541–580 (1989)[37] Muthukrishnan, S.: Data Streams: Algorithms and Applications. Foun-dations and Trends in Theoretical Computer Science (2) (2005). DOI10.1561/0400000002[38] Object Management Group: Business Process Model and Notation(BPMN). Formal Specification formal/2011-01-03, Object ManagementGroup (2011)[39] Redlich, D., Molka, T., Gilani, W., Blair, G., Rashid, A.: Constructs Com-petition Miner: Process Control-Flow Discovery of BP-Domain Constructspp. 134–150 (2014). DOI 10.1007/978-3-319-10172-9 \ (1), 37–57 (1985). DOI 10.1145/3147.3165[44] Weerdt, J. de, Backer, M. de, Vanthienen, J., Baesens, B.: A Multi-Dimensional Quality Assessment of State-Of-The-Art Process DiscoveryAlgorithms using Real-Life Event Logs. Inf. Syst. (7), 654–676 (2012).DOI 10.1016/j.is.2012.02.004 3245] Weijters, A.J.M.M., Aalst, W.M.P. van der: Rediscovering Workflow Mod-els from Event-Based Data Using Little Thumb. Integrated Computer-Aided Engineering (2), 151–162 (2003)[46] Weijters, A.J.M.M., Aalst, W.M.P. van der, Medeiros, A.K.A. de: ProcessMining with the HeuristicsMiner-Algorithm. BETA Working Paper SeriesWP 166, Eindhoven University of Technology (2006)[47] Weijters, A.J.M.M., Ribeiro, J.T.S.: Flexible Heuristics Miner (FHM). In:Computational Intelligence and Data Mining (CIDM), 2011 IEEE Sympo-sium on, pp. 310–317 (2011). DOI 10.1109/CIDM.2011.5949453[48] Werf, J.M.E.M. van der, Dongen, B.F. van, Hurkens, C.A.J., Serebrenik,A.: Process Discovery using Integer Linear Programming. Fundam. Inform. (3-4), 387–412 (2009)[49] Zelst, S.J. van, Burattin, A., Dongen, B.F. van, Verbeek, H.M.W.: DataStreams in ProM 6: A Single-node Architecture. In: Proceedings of theBPM Demo Sessions 2014 Co-located with the 12th International Con-ference on Business Process Management (BPM 2014), Eindhoven, TheNetherlands, September 10, 2014., p. 81 (2014)[50] Zelst, S.J. van, Dongen, B.F. van, Aalst, W.M.P. van der: Avoiding Over-Fitting in ILP-Based Process Discovery. In: Business Process Management- 13th International Conference, BPM 2015, Innsbruck, Austria, August31 - September 3, 2015, Proceedings, pp. 163–171 (2015). DOI 10.1007/978-3-319-23063-4 \\