[PDF] A Hybrid Neuro-Symbolic Approach for Complex Event Processing

Abstract

Training a model to detect patterns of interrelated events that form situations of interest can be a complex problem: such situations tend to be uncommon, and only sparse data is available. We propose a hybrid neuro-symbolic architecture based on Event Calculus that can perform Complex Event Processing (CEP). It leverages both a neural network to interpret inputs and logical rules that express the pattern of the complex event. Our approach is capable of training with much fewer labelled data than a pure neural network approach, and to learn to classify individual events even when training in an end-to-end manner. We demonstrate this comparing our approach against a pure neural network approach on a dataset based on Urban Sounds 8K.

Full PDF

aa r X i v : . [ c s . A I] O c t To appear in EPTCS.

A Hybrid Neuro-Symbolic Approach for Complex EventProcessing *Marc Roig Vilamala, Harrison Taylor

Cardiff University

Tianwei Xing, Luis Garcia, Mani Srivastava

University of California, Los Angeles

Lance Kaplan

CCDC Army Research Laboratory

Alun Preece, Angelika Kimmig

Cardiff University

Federico Cerutti

University of Brescia

Training a model to detect patterns of interrelated events that form situations of interest can be acomplex problem: such situations tend to be uncommon, and only sparse data is available. Wepropose a hybrid neuro-symbolic architecture based on Event Calculus that can perform ComplexEvent Processing (CEP). It leverages both a neural network to interpret inputs and logical rules thatexpress the pattern of the complex event. Our approach is capable of training with much fewerlabelled data than a pure neural network approach, and to learn to classify individual events evenwhen training in an end-to-end manner. We demonstrate this comparing our approach against a pureneural network approach on a dataset based on Urban Sounds 8K.

Imagine a scenario where we are trying to detect a shooting using microphones deployed in a city: shooting is a situation of interest that we want to identify from a high-throughput (audio) data stream.Complex Event Processing (CEP) is a type of approach aimed at detecting such situations of interest,called complex events , from a data stream using a set of rules. These rules are deﬁned on atomic piecesof information from the data stream, which we call events—or simple events , for clarity. Complex eventscan be formed from multiple simple events. For instance, shooting might start when multiple instances ofthe simple event gunshot occur. For simplicity, we can assume that when we start to detect siren events,authorities have arrived and the situation is being dealt with, which would conclude the complex event.Using the raw data stream implies that usually we cannot directly write declarative rules on thatdata, as it would imply that we need to process that raw data using symbolic rules; though theoreticallypossible, this is hardly recommended.Using a machine learning algorithm such a neural network trained with back-propagation is alsoinfeasible, as it will need to simultaneously learn to understand the simple events within the data stream,and the interrelationship between such events to compose a complex event. While possible, the sparsityof data makes this a hard problem to solve.The architecture we propose is a hybrid neuro-symbolic approach that allows us to combine theadvantages of both approaches. Our approach is capable of performing CEP on raw data after training inan end-to-end manner. Among other advantages, our approach is better at training with sparse data thanpure neural network approaches, as we will demonstrate. * This research was sponsored by the U.S. Army Research Laboratory and the U.K. Ministry of Defence under AgreementNumber W911NF-16-3-0001. The views and conclusions contained in this document are those of the authors and should notbe interpreted as representing the ofﬁcial policies, either expressed or implied, of the U.S. Army Research Laboratory, theU.S. Government, the U.K. Ministry of Defence or the U.K. Government. The U.S. and U.K. Governments are authorized toreproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. Code is available at https://github.com/MarcRoigVilamala/DeepProbCEP

AHybrid Neuro-Symbolic Approach for ComplexEvent Processing

ProbEC [5] is an approach for complex event processing using probabilistic logic programming. ProbECtakes an input stream of simple events—each of which has a probability of happening attached. Fromthere, it is capable of outputting how likely it is for a complex event to be happening at a given point intime based on some manually-deﬁned rules that describe the pattern for the complex event.In a previous paper, we built on top of ProbEC proposing a system that made use of pre-trainedneural networks to detect complex events from CCTV feeds [3]. However this approach required accessto pre-trained neural networks to process the simple events, which are not always available. To solve thisissue, we moved towards end-to-end training, which is able to train these neural networks from just theinput data and labels on when the complex events are happening.In order to implement an end-to-end training with a hybrid neuro-symbolic approach, we made useof

DeepProbLog [2], which incorporates deep learning into a probabilistic programming language. Thisallowed us to train the neural networks as part of the system in an end-to-end manner.DeepProbLog allows users to make use of the outputs of a neural network as part of the knowledgedatabase in a ProbLog program. DeepProbLog also allows users to train those neural networks in anend-to-end manner by calculating the gradient required to perform the gradient descent based on thetrue value of the performed query and the outputs provided by the neural network. This allows us toimplement a hybrid neuro-symbolic architecture that is able to learn in an end-to-end manner.

In this paper, we are proposing a hybrid neuro-symbolic architecture that performs CEP. As a proof ofconcept, we have implemented our architecture to perform CEP on audio data. In our implementation,each audio second is processed by a PyTorch implementation of VGGish , a feature embedding frontendfor audio classiﬁcation models [1] which outputs a feature vector for each second of the original audioﬁle. We use these features as input of a multi-layer perceptron (MLP, AudioNN in the ﬁgure) thatclassiﬁes the audio into a pre-deﬁned set of classes.The output of this neural network is then used in the logic layer, which contains the rules required toperform CEP. Here, the user can deﬁne which patterns of simple events constitute the starts and ends ofwhich complex events.DeepProbLog is used to integrate the different parts, which allows us to train the neural networkwithin the system in an end-to-end manner, which heavily reduces the cost of labelling; it is practicallyinfeasible to label each second of large collections of audio tracks, while it is much easier to identify thebeginning and the end of complex events as situations of interest. As such the system is provided withraw audio data and, for training, labels on when the complex events start and end.We experimentally compare our approach against a pure statistical learning approach using a neuralnetwork (Pure NN). Pure NN exchanges the logical layer for an MLP, which will learn the rules thatdeﬁne complex events. We engineered a synthetic dataset based on Urban Sounds 8K [4]. We considertwo repetitions of the same start event—or end event— within a certain time window as the signal of thebeginning—or termination— of a ﬂuent. Then, to test the efﬁcacy of the approach, we varied the size ofthe time window for repetitions, from 2 to 5 seconds. The information on when a complex event beginsor ends is used as training data. The goal of our sysntetic dataset is to be able to detect when a complexevent is happening from raw data. Available at https://github.com/harritaylor/torchvggish arc RoigVilamala etal. 3Table 1: Accuracy results (average over 10-fold cross validation) for a hybrid neuro-symbolic architec-ture (our approach) and a pure neural network approach for both individual sounds (Sound Acc) and apattern of two instances of the same sound class (Pattern Acc) within the window size. Best in bold.Approach Window Size2 3 4 5Sound Accuracy Hybrid (Our approach)

Pure NN 0.0725 0.1155 0.0845 0.0833Pattern Accuracy Hybrid (Our approach)

Pure NN 0.1843 0.2034 0.2289 0.1927For all the reported results, the corresponding system has been trained on a sequence generated fromrandomly-ordering the ﬁles from 9 of the 10 pre-sorted folds from Urban Sounds 8K, with the remainingfold being used for testing. As recommended by the creators of the dataset, 10-fold cross validation hasbeen used for evaluation. Before each evaluation, both systems have been trained for 10 epochs with 750training points on each epoch. This allows for both approaches to converge according to our experiments.Table 1 shows the results of our approach and Pure NN with different window sizes. It shows both theperformance for detecting the starts and ends of complex events (Pattern Accuracy) and for classifyingthe simple events in the sequence (Sound Accuracy). As we can see in the table, our approach is clearlysuperior, as Pure NN has a performance only marginally better than a random classiﬁer, which wouldarchive a performance of about 10%. Therefore our approach is very efﬁcient at learning from sparsedata, and, as a byproduct, can also train the neural network to classify simple events.

In this paper we demonstrated the superiority of our approach against a feedforward neural architecture.Further investigations that consider recurrent networks (RNNs) particularly long-short term memory(LSTM) networks are ongoing; we thank an anonymous reviewer for this suggestion. Further researchcould also include rule learning, which could be used to remove the necessity of knowing which patternsof simple events form which complex events.

References [1] S. Hershey, S. Chaudhuri, D. P. W. Ellis, J. F Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. ASaurous, B. Seybold et al. (2017):

CNN architectures for large-scale audio classiﬁcation . In: 2017 IEEEInternationalConferenceonAcoustics,SpeechandSignalProcessing, IEEE, pp. 131–135.[2] Robin Manhaeve, Sebastijan Dumancic, Angelika Kimmig, Thomas Demeester & Luc De Raedt (2018):

DeepProbLog: Neural Probabilistic Logic Programming . In: NIPS2018, pp. 3749–3759.[3] M. Roig Vilamala, L. Hiley, Y. Hicks, A. Preece & F. Cerutti (2019):

A Pilot Study on Detecting Violence inVideos Fusing Proxy Models . In: 201922thInternationalConferenceonInformationFusion, pp. 1–8.[4] J. Salamon, C. Jacoby & J. P. Bello (2014):

A Dataset and Taxonomy for Urban Sound Research . In: 22ndACMInternationalConferenceonMultimedia(ACM-MM’14), Orlando, FL, USA, pp. 1041–1044.[5] A. Skarlatidis, A. Artikis, J. Filippou & G. Paliouras (2015):