[PDF] StreamLearner: Distributed Incremental Machine Learning on Event Streams: Grand Challenge

Abstract

Today, massive amounts of streaming data from smart devices need to be analyzed automatically to realize the Internet of Things. The Complex Event Processing (CEP) paradigm promises low-latency pattern detection on event streams. However, CEP systems need to be extended with Machine Learning (ML) capabilities such as online training and inference in order to be able to detect fuzzy patterns (e.g., outliers) and to improve pattern recognition accuracy during runtime using incremental model training. In this paper, we propose a distributed CEP system denoted as StreamLearner for ML-enabled complex event detection. The proposed programming model and data-parallel system architecture enable a wide range of real-world applications and allow for dynamically scaling up and out system resources for low-latency, high-throughput event processing. We show that the DEBS Grand Challenge 2017 case study (i.e., anomaly detection in smart factories) integrates seamlessly into the StreamLearner API. Our experiments verify scalability and high event throughput of StreamLearner.

Full PDF

GGrand Challenge: StreamLearner – Distributed IncrementalMachine Learning on Event Streams

Christian Mayer, Ruben Mayer, and Majd Abdo

Institute for Parallel and Distributed SystemsUniversity of Stuttgart, [email protected]

ABSTRACT

Today, massive amounts of streaming data from smart devices needto be analyzed automatically to realize the Internet of Things. TheComplex Event Processing (CEP) paradigm promises low-latencypattern detection on event streams. However, CEP systems needto be extended with Machine Learning (ML) capabilities such asonline training and inference in order to be able to detect fuzzypatterns (e.g. outliers) and to improve pattern recognition accuracyduring runtime using incremental model training. In this paper, wepropose a distributed CEP system denoted as StreamLearner forML-enabled complex event detection. The proposed programmingmodel and data-parallel system architecture enable a wide range ofreal-world applications and allow for dynamically scaling up andout system resources for low-latency, high-throughput event pro-cessing. We show that the DEBS Grand Challenge 2017 case study(i.e., anomaly detection in smart factories) integrates seamlesslyinto the StreamLearner API. Our experiments verify scalability andhigh event throughput of StreamLearner.

CCS CONCEPTS • Computing methodologies → Vector / streaming algorithms ; Distributed programming languages ; Machine learning ; •

The-ory of computation → Streaming models ; •

Software and itsengineering → API languages ; KEYWORDS

Complex Event Processing, Machine Learning, Stream Processing

ACM Reference format:

Christian Mayer, Ruben Mayer, and Majd Abdo. 2017. Grand Challenge:StreamLearner – Distributed Incremental Machine Learning on Event Streams.In

Proceedings of DEBS ’17, Barcelona, Spain, June 19-23, 2017, (c) Owner 2017. This is the authors’ version of the work. It is posted here for your personal use. Not for redistribution.The definitive version is published in Proceedings of ACM International Conference on Distributed and Event-BasedSystems 2017 (DEBS ’17), http://dx.doi.org/10.1145/3093742.3095103.

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

In recent years, the surge of

Big Streaming Data being availablefrom sensors [12], social networks [17], and smart cities [3], has ledto a shift of paradigms in data analytics throughout all disciplines.Instead of batch-oriented processing [8, 14, 15], stream-oriented data analytics [7] is becoming the gold standard. This has led to thedevelopment of scalable stream processing systems that implementthe relational query model of relational data base managementsystems (RDBMS) as continuous queries on event streams [2], and

Complex Event Processing systems that implement pattern matchingon event streams [6].Query-driven stream processing, however, demands a domainexpert to specify the analytics logic in a deterministic query lan-guage with a query that exactly defines which input events aretransformed into which output events by an operator . However, anexplicit specification is not always possible, as the domain expertmight rather be interested in a more abstract query such as “

Re-port me all anomalies that molding machine 42 experiences on theshopfloor. ” In this example, it is infeasible to explicitly specify allevent patterns that can be seen as an anomaly .There have been different proposals how to deal with this issue.EP-SPARQL employs background ontologies to empower (com-plex) event processing systems with stream reasoning [1] – whilefocusing on the SPARQL query language. On the other hand, sev-eral general-purpose systems for stream processing exist such asApache Kafka [11], Apache Flink [4], Apache Storm [19], ApacheSpark Streaming [22]. Although these systems are powerful andgeneric, they are not tailored towards parallel and scalable incre-mental model training and inference on event streams.At the same time, an increasing body of research addresses incre-mental (or online) updates of Machine Learning (ML) models: thereare incremental algorithms for all kinds of ML techniques suchas support vector machines [5], neural networks [9], or Bayesianmodels [21]. Clearly, a stream processing framework supportingintuitive integration of these algorithms would be highly beneficial– saving the costs of hiring expensive ML experts to migrate thesealgorithms to the stream processing systems.In this paper, we ask the question: how can we combine event-based stream processing (e.g., for pattern recognition) with pow-erful Machine Learning functionality (e.g., to perform anomalydetection) in a way that is compatible with existing incremental MLalgorithms? We propose the distributed event processing systemStreamLearner that decouples expertise of Machine Learning fromDistributed CEP using a general-purpose modular API. In particular,we provide the following contributions. a r X i v : . [ c s . D C ] J un EBS ’17, June 19-23, 2017, Barcelona, Spain Mayer et al. • An architectural design and programming interface fordata-parallel CEP that allows for easy integration of exist-ing incremental ML algorithms (cf. Section 3). • An algorithmic solution to the problems of incremental K-Means clustering and Markov model training in the contextof anomaly detection in smart factories (cf. Section 4). • An evaluation showing scalability of the StreamLearnerarchitecture and throughput of up to 500 events per secondusing our algorithms for incremental ML model updates(cf. Section 5).

Machine Learning algorithms train a model using a given set oftraining data, e.g., building clusters, and then apply the trainedmodel to solve problems, e.g., classifying unknown events. In thecourse of streaming data becoming available from sensors, modelsneed to be dynamically adapted. That means, that new data is takeninto account in the learned model, while old data “fades out” andleaves the model as it becomes irrelevant. This can be modeled by a sliding window over the incoming event streams: Events within thewindow are relevant for the model training, whereas events that fallout of the window become irrelevant and should not be reflectedin the model any longer. Machine Learning on sliding windows isalso known as non-stationary Machine Learning , i.e., the problemof keeping a model updated as the underlying streaming data gen-eration “process” underlies a changing probability distribution. Toadapt the ML model online , there are different possibilities. For in-stance, incremental algorithms change the model in a step-by-stepfashion. The challenge in doing so is to support incremental pro-cessing – i.e., streaming learning. The model should not be re-builtfrom scratch for every new window, but rather incrementally beupdated with new data while old data is removed.Another challenge in ML in streaming data is that data fromdifferent streams might lead to independent models. For instance,data captured in one production machine might not be suitable totrain the model of another production machine. The challenge is todetermine which independent models shall be built based on whichdata from which incoming event streams. Further, the question ishow to route the corresponding events to the appropriate model.When these questions are solved, the identified machine learningmodels can be built in parallel – enabling scalable, low-latency, andhigh-throughput stream processing.

In this section, we first give an overview about the StreamLearnerarchitecture, followed by a description of the easy-to-use API forincremental machine learning and situation inference models.

The architecture of StreamLearner is given in Figure 1. In orderto parallelize ML-based computation, we have extended the split-process-merge architecture of traditional event-based systems [16–18]. The splitter receives events via the event input stream andforwards them to independent processing units, denoted as tube-ops ,according to its splitting logic. Each tube-op atomically performsML-based incremental stream processing by reading an event from

Tube-op 1Tube-op 2

Tube-op t

Machine 1Machine 2Machine m Training Inference 𝜔 𝜔 Stateful operatorStateless operator

Model Μ Trainer Predictor Μ′ Input Output

Splitter Merger

Scale out

Scale up

Shaping

Event queue 𝑒 𝑖 Event Input

Stream …… Model Μ′ Event Output Stream 𝑒 𝑖1 𝑒 𝑖2 𝑒 𝑖3 Figure 1: System Architecture the in-queue, processing the event, and forwarding the outputevent to the merger. The merger decides about the final events onthe event output stream (e.g. sorts the events from the differenttube-ops by timestamp to provide a consistent ordering of theevent output stream). Due to the independent processing of events,the architecture supports both, scale-up operations by spawningmore threads per machine and scale-out operations by adding moremachines.Each tube-op processes an event in three phases: shaping, train-ing, and inference. In the shaping phase, it performs stateless pre-processing operations ω and ω (denoted as shaper) to transformthe input event into appropriate formats. In the training phase, thestateful trainer module incrementally updates the model parame-ters of model M (e.g. a neural network in Figure 1) according to theuser-specified model update function. In the inference phase, theupdated model and the preprocessed event serve as an input for thestateful predictor performing a user-defined inference operationand transforming the updated model and the input event to anoutput event with the model-driven prediction.Note that the StreamLearner API does not restrict applicationprogrammers to perform training and inference on different eventdata . Hence, application programmers are free to use either disjointsubsets , or intersecting subsets of events in the stream for trainingand inference. Although it is common practice in ML to separatedata that is used for training and inference, we still provide this flex-ibility, as in real-world streams we might use some events for both,incorporating changing patterns into the ML model and initiatingan inference event using the predictor. However, the applicationprogrammer can also separate training and inference data by defin-ing the operators in the tube-op accordingly (e.g. generating a dummy event as input for the predictor to indicate that no inferencestep should be performed). Furthermore, the application program-mer can also specify whether the training should happen beforeinference or vice versa. The application programmer specifies the following functions inorder to use the StreamLearner framework in a distributed environ-ment.

Given an event e i , the application programmerdefines a stateful splitting function split ( e i ) that returns a tuple rand Challenge: StreamLearner – Distributed IncrementalMachine Learning on Event Streams DEBS ’17, June 19-23, 2017, Barcelona, Spain ( mid , tid , e i ) defining the tube-op tid on machine mid that receivesevent e i . The stateless shaper operations ω ( e i ) and ω ( e i ) return modified events e i and e i that serve as input for the trainerand the predictor module. The default shaper performs the identityoperation. The stateful trainer operation trainer ( e i ) returnsa reference to the updated model object M (cid:48) . The application pro-grammer can use any type of machine learning model as long asthe model can be used for inference by the predictor. If the model M remains unchanged after processing event e i , the trainer mustreturn a reference to the unchanged model M in order to triggerthe predictor for each event. StreamLearner performs a delayingstrategy when the application programmer prefers inference before learning. In this case, the tube-op first executes the predictor onthe old model M and executes the trainer afterwards to update themodel. The stateful predictor receives a reference tomodel M (cid:48) and input (event) e i and returns the predicted event e i = predictor ( M (cid:48) , e i ) . The stateful merger receives predicted outputevents from the tube-ops and returns a sequence of events that is putto the event output stream, i.e., merдer ( e i ) = f ( e , ..., e j , ..., e i ) for j < i and any function f . Any aggregator function, event orderingscheme, or filtering method can be implemented by the merger. In this section, we exemplify usage of our StreamLearner API basedon a realistic use case for data analytics posed by the DEBS GrandChallenge 2017 [10]. In smart factories, detecting malfunctioning of production machinesis crucial to enable automatic failure correction and timely reactionsto bottlenecks in the production line. The goal of this case studyis to detect anomalies, i.e., abnormal sequences of sensor eventsquantifying the state of the production machines. In particular, theinput event stream consists of events transporting measurementsfrom a set of production machines P to an anomaly detection oper-ator. The events are created by the set of sensors S that monitor theproduction machines. We include the time stamps of each measuredsensor event by defining a set of discrete time steps DT . Each event e i = ( p i , d i , s i , t i ) consists of a production machine id p i ∈ P thatwas monitored, a numerical data value d i ∈ R quantifying the stateof the production machine (e.g. temperature, pressure, failure rate),a sensor with id s i ∈ S that has generated the event, and a timestamp t i ∈ DT storing the event creation time.The anomaly detection operator has to pass three stages for eachevent-generating sensor (cf. Figure 2).First, it collects all events e i that were generated within the last W time units (denoted as event window ) and clusters the events e i 𝑒 K-means 𝑒 𝑒 𝑒 𝑒 𝑒 𝐶 𝐶 𝐶 𝑒 … Markov Training

𝑇 = 0 0 1 1 3 1 3 1 30 1 0

AnomalieDetection

𝑃(𝐶 → 𝐶 → 𝐶 → 𝐶 → 𝐶 ) < 𝜃 ? Yes / No 𝑊 time units 𝐾 cluster centers 𝑀 iterations Transition matrix 𝑁 number of transitions 𝜃 probability threshold 𝑒 → 𝑒 → 𝑒 → 𝑒 → 𝑒 𝐶 → 𝐶 → 𝐶 → 𝐶 → 𝐶 Event sequence

Figure 2: Case study anomaly detection in smart factories. using the K -means algorithm on the numerical data values d i forat least M iterations. The standard K -means algorithm iterativelyassigns each event in the window to its closest cluster center (withrespect to euclidean distance) and recalculates each cluster centeras the centroid of all assigned events’ numerical data values (in thefollowing we do not differentiate between events and their datavalues). In the figure, there are five events e , e , e , e , e in theevent window that are clustered into three clusters C , C , C . Withthis method, we can characterize each event according to its state ,i.e., the cluster it is assigned to.Second, the operator trains a first-order Markov model in orderto differentiate normal from abnormal event sequences. A Markovmodel is a state diagram, where a probability value is associated toeach state transition. The probability of a state transition dependsonly on the current state and not on previous state transitions (in-dependence assumption). These probabilities are maintained in atransition matrix T using the following method: (i) The Markovmodel consists of K states, one state for each cluster. Each eventis assumed to be in the state of the cluster it is assigned to. (ii)The events are ordered with respect to their time stamp – fromoldest to youngest. Subsequent events are viewed as state transi-tions. In Figure 2, the events can be sorted as [ e , e , e , e , e ] . Therespective state transitions are C → C → C → C → C . (iii)The transition matrix contains the probabilities of state transitionsbetween any two states, i.e., cluster centers. The probability of twosubsequent events being in cluster C i and transition into cluster C j for all i , j ∈ { , ..., K } is the relative number of these observa-tions. For example the probability of transition from state C tostate C is the number of events in state C that transition to state C divided by the total number of transitions from state C , i.e., P ( C | C ) = C → C C → (cid:63) = / probability of a sequence ofobserved transitions with length N . In particular, if a series of unlikelystate transitions is observed, i.e., the total sequence probability isbelow the threshold Θ , an event is generated that indicates whetheran anomaly has been found. The probability of the sequence can becalculated by breaking the sequence into single state transitions, i.e.,in Figure 2, P ( C → C → C → C → C ) = P ( C → C ) P ( C → C ) P ( C → C ) P ( C → C ) . Using the independence assumption ofMarkov models, we can assign a probability value to each sequenceof state transition and hence quantify the likelihood. EBS ’17, June 19-23, 2017, Barcelona, Spain Mayer et al.

The scenario fits nicely into the StreamLearner API: for each sen-sor, an independent ML model is subject to incremental trainingand inference steps. Therefore, each thread in the StreamLearnerAPI is responsible for all observations of a single sensor enablingStreamLearner to monitor multiple sensors in parallel.

The splitter receives an event e i = ( p i , d i , s , t i ) and assigns the event exclusively to the thread that is responsible forsensor s (or initiates creation of this responsible thread if it does notexist yet). It uses a simple hash map assigning sensor ids to threadids to provide thread resolution with constant time complexityduring processing of the input event stream. With this method,we break the input stream into multiple independent sensor eventstreams (one stream per sensor). Shapers ω and ω are simply identity operatorsthat pass the event without changes to the respective training orprediction modules. The trainer maintains and updates the model inan incremental fashion. The model is defined via the transition ma-trix T that is calculated using K-means clustering and the respectivestate transition sequence. Incremental K-Means.

The goal is to iteratively assign each eventto the closest cluster center and recalculate the cluster center asthe centroid of all assigned events. The standard approach is toperform M iterations of the K-means clustering algorithm for allevents in the event window when triggered by the arrival of a newevent. However, this method results in suboptimal runtime due tounnecessary computations that arise in practical settings: • A single new event in the event window will rarely havea global impact to the clustering. In particular, most as-signments of events to clusters remain unchanged afteradding a new event to the event window. Therefore, thebrute-force method of full reclustering can result in hugecomputational redundancies. • Performing M iterations is unnecessary, if the clusteringhas already converged in an earlier iteration M (cid:48) < M .Clearly, we should terminate the algorithm as fast as pos-sible. • The one-dimensional K-means problem is fundamentallyeasier than the standard NP-hard K-means problem: an op-timal solution can be calculated in polynomial time O( n K ) for fixed number of clusters K and number of events inthe window n [13, 20]. Therefore, using a general-purposeK-means algorithm that supports arbitrary dimensionalitycan result in unnecessary overhead (the trade-off betweengenerality, performance, and optimality).This is illustrated in Figure 3. There are four clusters C , ..., C and events e , ..., e in the event window. A new event e is arriving.Instead of recomputation of the whole clustering in each iteration,i.e., calculating the distance between each event and cluster center,we touch only events that are potentially affected by a change ofthe cluster centers. For example, event e is assigned to cluster C which leads to a new cluster center C (cid:48) . However, the next closest 𝑒 𝑒 𝑒 𝑒 𝑒 𝐶 𝐶 𝐶 𝑒 𝐶 New event 𝑒 → New cluster center 𝐶 ′𝐶 ′ 𝐶 unchanged →𝐶 unchanged →𝐶 unchanged Figure 3: Saving computation time in K-Means. event e (left side) keeps the same cluster center C . Our basicreasoning is that each event on the left side of the unchanged event e keeps its cluster center as there can be no disturbance in theform of changed cluster centers left-hand of e (only a cascadingcluster center shift is possible as C ≥ C ≥ C ≥ C in any phaseof the algorithm). A similar argumentation can be made for theright side and also for the removal of events from the window.This idea heavily utilizes the possibility of sorting cluster cen-ters and events in the one-dimensional space. It reduces averageruntime of a single iteration of K-means as in many cases onlya small subset of events has to be accessed. Combined with theoptimization of skipping further computation after convergencein iteration M (cid:48) < M , incremental updates of the clustering canbe much more efficient than naive reclustering. The incrementalone-dimensional clustering method is in the same complexity classas naive reclustering as in the worst case, we have to reassign allevents to new clusters (the sorting of events takes only logarithmicruntime complexity in the event window size per insertion of anew event – hence the complexity is dominated by the K-meanscomputation). Markov Model.

The Markov model is defined by the state transi-tion matrix T . Cell ( i , j ) in the transition matrix T is the probabilityof two subsequent events to transition from cluster C i (the firstevent) to cluster C j (the second event). Semantically, we count thenumber of state transitions in the event window to determine therelative frequency such that the row values in T sum to one. Insteadof complete recomputation of the whole matrix, we only recalculatethe rows and columns of clusters that were subject to any changein the K-means incremental clustering method. This ensures thatall state transitions are reflected in the model while saving compu-tational overhead. A reference to the new model T is handed to thepredictor method that performs inference on the updated model aspresented in the following. The predictor module applies the inference stepon the changed model for each incoming event. In this scenario,inference is done via the Markov model (i.e., the transition matrix T ) to determine whether an anomaly was detected or not. We usethe transition matrix to assign a probability value to a sequence ofevents with associated states (i.e., cluster centers). The brute-forcemethod would calculate the product of state transition probabilitiesfor each sequence of length N and compare it with the probabilitythreshold Θ . However, this leads to many redundant computationsfor subsequent events.We present an improved incremental method in Figure 4. Theevent window consists of events e , ..., e sorted by time stamps. rand Challenge: StreamLearner – Distributed IncrementalMachine Learning on Event Streams DEBS ’17, June 19-23, 2017, Barcelona, Spain 𝑒 𝑒 𝑒 𝑒 𝑒 𝑒 𝑒 𝑒 𝑊 time units 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑇 = 1/3 2/31/4 3/4𝑁 = 3

Event windowClusters Markov modelAnomaliedetection

Π = 13 ∗ 23 ∗ 34 > θ θ = 0.1

Threshold Π ′ = Π ′′ = Π′ … No. Transitions

Figure 4: anomaly detection.

Each event is assigned to a cluster C or C resulting in a seriesof state transitions. We use the transition matrix of the Markovmodel to determine the probability of each state transition. Wecalculate the probability of the state transition sequence as theproduct of all state transitions (the state independence property ofMarkov models). For instance the probability Π of the first threestate transitions is Π = P ( C | C ) ∗ P ( C | C ) ∗ P ( C | C ) = / ∗ / ∗ / = / Θ = .

1. Nowwe can easily calculate the probability of the next state transitionsequence of length N by dividing by the first transition probabilityof the sequence (i.e., P ( C | C ) = /

3) and multiplying with theprobability of the new state transition (i.e., P ( C | C ) = / Π (cid:48) of the next state transition sequence is Π (cid:48) = Π / ∗ / = / > Θ . This method reduces the number ofmultiplications to N + ( W − N ) rather than N ( W − N ) . Finally, thepredictor issues an anomaly detection event to the merger (Yes/No). The merger sorts all anomalies events w.r.t. timestamp to ensure a consistent output event stream using the sameprocedure as in GraphCEP [17]. This method ensures a monotonicincrease of event time stamps in the output event stream.

In this section, we present our experiments with StreamLearner onthe DEBS Grand Challenge 2017 data set with 50,000 sensor dataevents.

Experimental Setup:

We used the following two computingenvironments. (i) A notebook with 4 × . × . Adapting the window size W : In Figure 5a, we show the abso-lute throughput of StreamLearner on the y-axis and different win-dow sizes W on the x-axis using the notebook for a different numberof threads. Clearly, larger window size leads to lower throughputas computational overhead grows. We normalized this data in Fig-ure 5c to the interval [ , ] to compare the relative throughputimprovements for the different number of threads. Clearly, the ben-efit of multi-threading arises only for larger window sizes due to the constant distribution overhead that can not be compensated byincreased parallelism because each thread has only little computa-tional tasks between points of synchronization (on the splitter andon the merger). Overall scalability is measured in Figure 5a. It canbe seen that StreamLearner scales best for data-parallel problemswith relatively little synchronization overhead in comparison to thecomputational task. For small window sizes (e.g. W = . × .In Figure 6a, we repeated the experiment on the shared-memoryinfrastructure. The first observation is that the single threadedexperiments are four times slower compared to the notebook infras-tructure due to the older hardware. Nevertheless, in Figure 6b, wecan see clearly that the relative throughput decreases when usinga low rather than a high number of threads (e.g. for larger windowsizes W > Adapting the number of clusters K : In Figure 7, we plot theabsolute throughput for a varying number of clusters and differentthreads. We fixed the window size to W = StreamLearner is a distributed CEP system and API tailored toscalable event detection using Machine Learning on streaming data.Although our API is general-purpose, StreamLearner is especiallywell-suited to data-parallel problems – with multiple event sourcescausing diverse patterns in the event streams. For these scenarios,StreamLearner can enrich standard CEP systems with powerfulMachine Learning functionality while scaling exceptionally welldue to the pipelined incremental training and inference steps onindependent models.

REFERENCES [1] Darko Anicic, Paul Fodor, Sebastian Rudolph, and Nenad Stojanovic. 2011. EP-SPARQL: A Unified Language for Event Processing and Stream Reasoning. In

Proceedings of the 20th International Conference on World Wide Web (WWW’11) . ACM, New York, NY, USA, 635–644.

DOI: https://doi.org/10.1145/1963405.1963495[2] Arvind Arasu, Shivnath Babu, and Jennifer Widom. 2006. The CQL ContinuousQuery Language: Semantic Foundations and Query Execution.

The VLDB Journal

15, 2 (June 2006), 121–142.

DOI: https://doi.org/10.1007/s00778-004-0147-z[3] Michael Batty. 2013. Big data, smart cities and city planning.

Dialogues in HumanGeography

3, 3 (2013), 274–279.[4] Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi,and Kostas Tzoumas. 2015. Apache flink: Stream and batch processing in asingle engine.

Bulletin of the IEEE Computer Society Technical Committee on DataEngineering

36, 4 (2015).[5] Gert Cauwenberghs and Tomaso Poggio. 2001. Incremental and decrementalsupport vector machine learning. In

Advances in neural information processingsystems . 409–415.[6] Gianpaolo Cugola and Alessandro Margara. 2010. TESLA: A Formally DefinedEvent Specification Language. In

Proceedings of the Fourth ACM InternationalConference on Distributed Event-Based Systems (DEBS ’10) . ACM, New York, NY,USA, 50–61.

DOI: https://doi.org/10.1145/1827418.1827427

EBS ’17, June 19-23, 2017, Barcelona, Spain Mayer et al. T h r o u g h p u t ( e v e n t s / s e c ) (a) Absolute Throughput. T h r o u g h p u t ( % ) (b) Normalized Throughput.

10 100 500Window Size (sec)020406080100 T h r o u g h p u t ( % ) (c) Scalability. Figure 5: Throughput evaluations for different window sizes W on notebook. T h r o u g h p u t ( e v e n t s / s e c ) (a) Absolute Throughput. T h r o u g h p u t ( % ) (b) Normalized Throughput.

10 100 250Window Size (sec)020406080100 T h r o u g h p u t ( % ) (c) Scalability. Figure 6: Throughput evaluations for different window sizes on shared memory infrastructure.

10 20 30 40 50 60 70 80 90 100Number of Clusters (K)0100200300400500600 T h r o u g h p u t ( e v e n t / s e c ) Figure 7: Throughput for varying number of clusters K . [7] Gianpaolo Cugola and Alessandro Margara. 2012. Processing flows of informa-tion: From data stream to complex event processing. ACM Computing Surveys(CSUR)

44, 3 (2012), 15.[8] Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Pro-cessing on Large Clusters.

Commun. ACM

51, 1 (Jan. 2008), 107–113.

DOI: https://doi.org/10.1145/1327452.1327492[9] Shen Furao, Tomotaka Ogura, and Osamu Hasegawa. 2007. An enhanced self-organizing incremental neural network for online unsupervised learning.

NeuralNetworks

20, 8 (2007), 893–903.[10] Vincenzo Gulisano, Zbigniew Jerzak, Roman Katerinenko, Martin Strohbach,and Holger Ziekow. 2017. The DEBS 2017 grand challenge. In

Proceedings of the11th ACM International Conference on Distributed and Event-based Systems, DEBS’17, Barcelona, Spain, June 19 - 23, 2017 .[11] Jay Kreps, Neha Narkhede, Jun Rao, and others. 2011. Kafka: A distributedmessaging system for log processing. In

Proceedings of the NetDB . 1–7.[12] Narayanan C Krishnan and Diane J Cook. 2014. Activity recognition on streamingsensor data.

Pervasive and mobile computing

10 (2014), 138–154. [13] Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan. 2009. Theplanar k-means problem is NP-hard. In

International Workshop on Algorithmsand Computation . Springer, 274–285.[14] Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, IlanHorn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In

Proceedings of the 2010 ACM SIGMOD InternationalConference on Management of data . ACM, 135–146.[15] Christian Mayer, Muhammad Adnan Tariq, Chen Li, and Kurt Rothermel. 2016.GrapH: Heterogeneity-Aware Graph Computation with Adaptive Partitioning.In

Proc. of IEEE ICDCS .[16] Ruben Mayer, Boris Koldehofe, and Kurt Rothermel. 2015. Predictable Low-Latency Event Detection with Parallel Complex Event Processing.

Internet ofThings Journal, IEEE

2, 4 (Aug 2015), 274–286.[17] Ruben Mayer, Christian Mayer, Muhammad Adnan Tariq, and Kurt Rothermel.2016. GraphCEP: Real-time Data Analytics Using Parallel Complex Event andGraph Processing. In

Proceedings of the 10th ACM International Conference onDistributed and Event-based Systems (DEBS ’16) . ACM, New York, NY, USA, 309–316.

DOI: https://doi.org/10.1145/2933267.2933509[18] Ruben Mayer, Muhammad Adnan Tariq, and Kurt Rothermel. 2017. MinimizingCommunication Overhead in Window-Based Parallel Complex Event Processing.In

Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems (DEBS ’17) . ACM, New York, NY, USA, 12.

DOI: https://doi.org/10.1145/3093742.3093914[19] Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh MPatel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham,and others. 2014. Storm@ twitter. In

Proceedings of the 2014 ACM SIGMODinternational conference on Management of data . ACM, 147–156.[20] Haizhou Wang and Mingzhou Song. 2011. Ckmeans. 1d. dp: optimal k-meansclustering in one dimension by dynamic programming.

The R journal

3, 2 (2011),29.[21] Robert C Wilson, Matthew R Nassar, and Joshua I Gold. 2010. Bayesian onlinelearning of the hazard rate in change-point problems.

Neural computation

22, 9(2010), 2452–2476.[22] Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, andIon Stoica. 2013. Discretized streams: Fault-tolerant streaming computation atscale. In